Learning Git

Git is a distributed revision control system and a source code management system. It was developed by Linus Torvalds for Linux kernel development. The first release was made on April 7, 2005. [Reference – wikipedia - http://en.wikipedia.org/wiki/Git_(software) ]
Git is a free software distributed under the terms of GNU General Public License version 2.

Git can be downloaded from http://git-scm.com/.

Git is similar to CVS, SVN or Visual Source Safe. Before going ahead, I assume that you understand "Why a versioning system."
I have worked on CVS and SVN and I can say that in these, the the working directory holds code + meta information about the repository like the base files (files checked out of server), timestamps, etc. There is a common central server where we checkin files after completion and checkout files to start working or update changes from others. So we need a central server and network access to it in order to use CVS or SVN.
On the other hand, every Git Working directory is a full fleged git repository with complete history and full revision tracking capabilities. It does not depend on network access or a central server. Although we can maintain a central server and occasionally checkin our work and checkout others work. One such open repository is github that we shall see shortly.

Basic concepts

  1. Working directory: It is a directory which is directly under control of the versioning system. All the files and folders in this directory are also in control of the versioning system. So we can say or make the root folder of our project as working directory.
  2. Checkin: When we work on the files in working directory, we save them time and again. But this doesn't mean that the files are saved in the versioning system. To save the file in versioning system, we "commit" the files (just like we commit data in database after an insert/update/delete query). We say to server "Here is my latest file. Replace your copy of file with my copy." Once we commit, the files are saved on the server wherefrom, others can access it.
  3. Checkout: The reverse process of checkin is checkout. When we "checkout", we ask the server "Can you please give me the latest files?". So server returns us the latest files it has. And then we start working on latest files.

I would explain using git in a step by step manner. And I expect you follow the steps as I do. Make sure of following
  • You have already downloaded and installed Git from http://git-scm.com/. We will use command line version and not the GUI version.
  • You are paying attention to what I say.
  • I am using Windows 7. The OS doesn't matter, just make sure you are using correct commands to create/delete/rename directory or file.

Legend

  • All the items marked in blue are the git commands.
  • All items in black background are the ones that you should see on your console when you execute these commands.
  • The items marked in other colours in the black background are the text that would appear with that colour in the console when you execute the mentioned commands.

Cloning a working git repository

Assuming that a git project is already uploaded at some repository whose access and URL are provided to us, we can clone remote repository into our local repository using
git clone
This will clone the remote repository and create a working copy for us.
The output should be something like
G:\CloneOfFirstGit>git clone https://github.com/AuthorName/MyFirstGit.git
Cloning into 'MyFirstGit'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 3 (delta 0)
Unpacking objects: 100% (3/3), done.
Note: You can test this command once you complete this tutorial using your own URL. Or alternatively, you could clone a public repository.

Initializing git working directory

We shall start a new project. Create a new directory called MyFirstGit. Use command promt/terminal/shell to navigate to this directory. Once you are in this directory, fire command
git status
The output I get is
G:\MyFirstGit>git status
fatal: Not a git repository (or any of the parent directories): .git
This is how we see status of files being maintained by git.

Then let us create a git repository here. Type
git init
The output is
G:\MyFirstGit>git init
Initialized empty Git repository in G:/MyFirstGit/.git/

Our git is initialized. Let us check the status once again
G:\MyFirstGit>git status
# On branch master
#
# Initial commit
#
nothing to commit (create/copy files and use "git add" to track)
Git clearly says that there is no file to commit. But it says that it is "On branch master". In Git, the working directory (local directory where our project is) is nicknamed "master".

Committing files in Git

Let us create two files: fileA.txt and fileB.txt. And then we check the status.
# On branch master
#
# Initial commit
#
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# fileA.txt
# fileB.txt
nothing added to commit but untracked files present (use "git add" to track)
Note the red coloured file names. These are files in working directory, but are not being tracked by git. The command itself says that we need to add these files to track using
git add
So the output of "git add fileA.txt" is
G:\MyFirstGit>git add fileA.txt

G:\MyFirstGit>git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
# (use "git rm --cached ..." to unstage)
#
# new file: fileA.txt
#
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# fileB.txt
Output says that (in green colour) fileA.txt has been added as a new file. But it is under section "Changes to be committed". This means that the file is being tracked, but is not yet committed. The section "Untracked files" say that (in red colour) fileB.txt is still untracked.

Anyways, we decide that we should commit fileA.txt at this moment, rather than losing the changes. So as you may have guesses, we fire a commit command. The syntax looks like
git commit -m
A commit message is the message that describes this commit. The description will be required later on to identify changes or roll back. The output is
G:\MyFirstGit>git commit -m "First Commit."
[master (root-commit) 1397a2a] First Commit.
0 files changed
create mode 100644 fileA.txt
The output says that our file has been committed. So now let us check the status.
# On branch master
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# fileB.txt
nothing added to commit but untracked files present (use "git add" to track)
Oh! Out fileB.txt is still untracked. But thats okay.

[two days later]

We come back to work again. But we forgot what happened last time. So we check the logs. And yes, we type
git log
And it looks like
G:\MyFirstGit>git log
commit 1397a2a08db3d43092340dd20f117dea2751fe1a
Author: Author Name
Date: Thu Feb 14 12:09:17 2013 +0530

First Commit.
Oh, now we can see. We did our "First Commit." This is the message we typed in during our previous commit. We see the date and time, and also the author who committed. But there is some gibberish in olive colour and we aren't much worried about it.

Interacting with github

Keep in mind that Git maintains a complete repository locally – on local machine – right in the working directory. You can check that upon "git init", git had created a folder named ".git". This is the folder where git keeps all the requisite information. But what if our machins crashes? Ofcourse we can recover data, but what if we can't? What if machins is stolen or damaged irrecoverably? This is the basic reason why cvs and svn are used – they keep latest copy on server. So we now come to git server – github. It is a freely hosted server on internet.

Open https://github.com/. Register an account for yourself.
Once you login, you need to create a repository for yourself. We will create a public repository.
On the home page, on the top menu bar, towards right, you can see a "book" or a "bookmark" kind of icon. Hover over it and it says "Create a new repo". So click on it.
We are greeted with a form. Type the repository name as "MyFirstGit" and the description as "My first git repository." You can give any name and description, but for now, stick to this one. Click the "Create Repository" button.
Alright! We created our repository. But we need our local machine to link to this repository. To link, we would need a URL. And the URL is mentioned in the repository page. It would look something like

Nice. Copy the URL and go to command prompt. Type
git remote add origin
Replace the with the url that you copied from website. The URL is something like
https://github.com/HarshvardhanSingh/MyFirstGit.git
The "origin" is a nick name to our remote repository on github.
Note: "master" is the nick name to our local repository on local machine.
The output to "git remote add" command is
G:\MyFirstGit>git remote add origin https://github.com/AuthorName/MyFirstGit.git
Thats it! Remote repository is added.
Let us push our fileA.txt to the repository.
git push -u origin master
The command tells git to push content from master to origin i.e. local to github. The output is
G:\MyFirstGit>git push -u origin master
Username for 'https://github.com': author.email.id@domain.com
Password for 'https://author.name@domain.com@github.com':
Counting objects: 3, done.
Writing objects: 100% (3/3), 223 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/AuthorName/MyFirstGit.git
* [new branch] master -> master
Branch master set up to track remote branch master from origin.
Note: You will be asked to enter your github username and password everytime you interact with server. And the password is typed invisibly. And if you make any mistake in typing password, the backspace or delete key won't help.

The command tells us that it has sent data and "master" is set up to track "origin".

Similarly, to get the data from server (checkout), we do a "pull" The command is
git pull origin master
We get
G:\MyFirstGit>git pull origin master
From https://github.com/AuthorName/MyFirstGit
* branch master -> FETCH_HEAD
Already up-to-date.
Okay, it says that we are "up-to-date".

Working with revisions

Each time we perform a commit, we create a revision. It may so happen that we did something and we forgot what. So we can compare current file with the file that was last committed.

So to test this, let us edit and commit fileA.txt again.

G:\MyFirstGit>git status
# On branch master
# Changes not staged for commit:
# (use "git add ..." to update what will be committed)
# (use "git checkout -- ..." to discard changes in working directory)
#
# modified: fileA.txt
#
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# fileB.txt
no changes added to commit (use "git add" and/or "git commit -a")

G:\MyFirstGit>git add fileA.txt

G:\MyFirstGit>git commit -m "Modified fileA.txt"
[master 24136f1] Modified fileA.txt
1 file changed, 1 insertion(+)

G:\MyFirstGit>git status
# On branch master
# Your branch is ahead of 'origin/master' by 1 commit.
#
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# fileB.txt
nothing added to commit but untracked files present (use "git add" to track)

If you notice, you can see that after the commit, when we get the status, it says that we are ahead of origin by 1 commit. Anyways, that is not what we are looking for. Make another modification in fileA.txt.

To check the difference between current file and last committed file, run the command
git diff head
"head" is a nick name to the pointer to most recent commit. The output would be as
G:\MyFirstGit>git diff head
diff --git a/fileA.txt b/fileA.txt
index a28985b..2f3bdff 100644
--- a/fileA.txt
+++ b/fileA.txt
@@ -1 +1,3 @@
-This is my edit.
\ No newline at end of file
+This is my edit.
+
+See, I have made this modification.
\ No newline at end of file
Once you run the command, you can by yourself see the modifications made.
In this case, I had added a line "See, I have made this modification." at the end of the committed file.

So let us add this modified file to git using "git add" and then fire the command
git diff –staged
The output looks like
G:\MyFirstGit>git add fileA.txt

G:\MyFirstGit>git diff --staged
diff --git a/fileA.txt b/fileA.txt
index a28985b..2f3bdff 100644
--- a/fileA.txt
+++ b/fileA.txt
@@ -1 +1,3 @@
-This is my edit.
\ No newline at end of file
+This is my edit.
+
+See, I have made this modification.
\ No newline at end of file
The difference is similar to what we saw earlier – between un added fileA.txt and head.
In "git diff --stage", the "--stage" refers to stage.
Stage is the state in which the files have been added using "git add" but have not yet been committed using "git commit". These appear as green colour files when we do a "git status".
Currently fileA.txt is in stage. So a "git status" will look like
G:\MyFirstGit>git status
# On branch master
# Your branch is ahead of 'origin/master' by 1 commit.
#
# Changes to be committed:
# (use "git reset HEAD ..." to unstage)
#
# modified: fileA.txt
#
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# fileB.txt
The green coloured files are said to be in stage.

We do not want to commit this change to repository. We do not like it. But we added it to stage using "git add" by mistake. So to "un-add" it i.e. Remove it from stage, we do a reset using
git reset
The output would look like
G:\MyFirstGit>git reset fileA.txt
Unstaged changes after reset:
M fileA.txt
The file has now been reset i.e. It has been removed from stage. Check the difference from stage and head. Stage say that there are no differences. Why? Because we just took the file out of stage. But the difference from previous commit still exist. We get output something like
G:\MyFirstGit>git diff --staged

G:\MyFirstGit>git diff head
diff --git a/fileA.txt b/fileA.txt
index a28985b..2f3bdff 100644
--- a/fileA.txt
+++ b/fileA.txt
@@ -1 +1,3 @@
-This is my edit.
\ No newline at end of file
+This is my edit.
+
+See, I have made this modification.
\ No newline at end of file
But we wish to revert the file altogether. We want it to be in same state as it was in during previous commit. So we simply do a checkout. We use command
git checkout --
So after checkout we again do a "git diff head". This gives us no difference – confirming that the changes have been made. The file is same as it was during last commit.
G:\MyFirstGit>git checkout -- fileA.txt

G:\MyFirstGit>git diff head
We haven't been using fileB.txt for a long time. So let us remove it completely from the repository. We use the following command to remove all the files that are not under control of git from the working directory.
git clean -f
The "-f" flag is for "force". So it forces the git to remove all the untracked/unstaged files.
G:\MyFirstGit>git clean -f
Removing fileB.txt

G:\MyFirstGit>git status
# On branch master
# Your branch is ahead of 'origin/master' by 1 commit.
#
nothing to commit, working directory clean
But we do not want to work on fileA.txt anymore. So just like we delete files in linux using "rm" command, we remove from here. The files removed are deleted from disk and are also put of stage so that they can be committed. It is a combination of 2 manual steps – deleting the file from disk and adding it to stage using "git add".
git rm
The output looks like
G:\MyFirstGit>git rm fileA.txt
rm 'fileA.txt'

G:\MyFirstGit>git status
# On branch master
# Your branch is ahead of 'origin/master' by 1 commit.
#
# Changes to be committed:
# (use "git reset HEAD ..." to unstage)
#
# deleted: fileA.txt
#
Now we can simply commit to remove the file from repository as well.

Ignoring files

Many a times we work on multiple files. And a few files are very specific and local to us. Examples could be IDE files like .project or the directory where temporary files are created. Under such conditions it is wise to tell git not to track them or even think about them. These files will not occur even in "git status" anymore.
The easiest way to do this is to create a .gitignore file and type in the name of the files you wish to ignore. Wildcards and directories are also supported.
Let us first create a file called fileC.txt Now check the status.
# On branch master
# Your branch is ahead of 'origin/master' by 2 commits.
#
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# fileC.txt
nothing added to commit but untracked files present (use "git add" to track)
Then we create a .gitignore file. On windows this is not allowed directly i.e. By creating a new text document and renaming it. But you can open the text document and then save it as ".gitignore". This works. But remember, you actually need to type those double quotes too. Once the .gitignore file is created, write fileC.txt in it. Now check the status. You can see the difference.
G:\MyFirstGit>git status
# On branch master
# Your branch is ahead of 'origin/master' by 2 commits.
#
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# .gitignore
nothing added to commit but untracked files present (use "git add" to track)
Ofcourse we see the .gitignore file here. This is because it can be committed and shared between people so that each can use the same .gitignore file if they want. But if you do not wish to track the .gitignore file itself, then add another line in it specifying its own name. Simple!
Note: If you are already tracking a file, and then add it to .gitignore, the .gitignore will have no effect.

Many a times it so happens that we are working on a file, and then we have made a change that we do not want to commit. So in such case, we can remove this file from the tracking list temporarily. To see this in action, create a file called fileD.txt and then add it to tracking and then commit it. Now modify the fileD.txt and see the status. It should look like
G:\MyFirstGit>git status
# On branch master
# Your branch is ahead of 'origin/master' by 3 commits.
#
# Changes not staged for commit:
# (use "git add ..." to update what will be committed)
# (use "git checkout -- ..." to discard changes in working directory)
#
# modified: fileD.txt
#
no changes added to commit (use "git add" and/or "git commit -a")
Now to remove it from tracking list, type in
git update-index –assume-unchanged
Once you type this command for fileD.txt, check the status and see the difference. The fileD.txt will not be tracked any more.
G:\MyFirstGit>git update-index --assume-unchanged fileD.txt

G:\MyFirstGit>git status
# On branch master
# Your branch is ahead of 'origin/master' by 3 commits.
#
nothing to commit, working directory clean
Now to re-enable the tracking simply fire command
git update-index –no-assume-unchanged fileD.txt
With this, the file will be tracked again.
G:\MyFirstGit>git update-index --no-assume-unchanged fileD.txt

G:\MyFirstGit>git status
# On branch master
# Your branch is ahead of 'origin/master' by 3 commits.
#
# Changes not staged for commit:
# (use "git add ..." to update what will be committed)
# (use "git checkout -- ..." to discard changes in working directory)
#
# modified: fileD.txt
#
no changes added to commit (use "git add" and/or "git commit -a")

Tagging

Tagging is the process of taking a snapshot of repository. This is like a logical bookmark and the files can at any time be reverted or checkout at this snapshot. Usually a snapshot is taken when QA identifies a release as stable. So if there is any problem in near future, we can revert back to this snapshot and start again.
A tag in git can be added as
git tag -a -m
-a switch tells git to add a tag with given tag name. The -m swtich tells git to attach this tag with given description.
G:\MyFirstGit>git tag -a Version1 -m "First tag version"
All the tags in the repository can be seen as
git tag
It will list down all the tags created for this repository.
G:\MyFirstGit>git tag
Version1
To checkout to a particular tag revision,
git checkout
This outputs
G:\MyFirstGit>git checkout Version1
M fileD.txt
Note: checking out 'Version1'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b new_branch_name

HEAD is now at 6cdfe88... Added fileD.txt
That is a lot of information with summary that – checkout is done.

See all the repositories

To check all the repositories configured with our repository, we use the following command
git remote
Since we have connected to only one repository and nicknamed it as "origin", our output is
G:\MyFirstGit>git remote
origin

Git Branches

When we work, we work on only one branch at a time. But a project may run multiple branches. e.g. A branch may be development branch for a new version of app, other may be a defect fixing branch for the old app version, and others may be the branches in testing or staging etc.
In essence a branch is a copy of the main repository which are used for different purposes on which commits and pulls are performed separately.
To see all the branches currently associated with the repository, use the command
git branch -a
-a switch tells git to list down all the branches. The output should be something like
G:\MyFirstGit>git branch -a
* (no branch)
master
remotes/origin/master

To see only the remote branches, type
git remote show
The output would look like
G:\MyFirstGit>git remote show
origin

To remove a remote branch
git remote rm
The output for removing origin would look like
G:\MyFirstGit>git remote rm origin
This would remove the remote branch named origin.

Working with patches

A patch is a text file that contains changes in source code. Once created, it can then be sent to another user who can use it to patch his repository. In this case, he does not need to checkout the code from repository.
A patch is created as
git format-patch /
When run on origin/master, it would look like
G:\MyFirstGit>git format-patch origin/master
0001-Modified-fileA.txt.patch
0002-Deleted-fileA.txt.patch
0003-Added-fileD.txt.patch
The 0001-Modified-fileA.txt.patch and other files are the patches that have been created. These are the differences between origin and master branches. We can then send this patch to some other person who could incorporate it in his repository as
git apply
I am unable to create the output for this command because the patch has been created by my repository. If it were some other repository, we would have seen some output. So I cloned the repository in a different folder to check and the output was
G:\CloneOfFirstGit\MyFirstGit>git apply ..\..\MyFirstGit\0001-Modified-fileA.txt.patch

Creating my own git repository

My knowledge of bare repositories is limited. Please pardon me if I say something wrong.
Till now we used github as a remote repository. But we may wish to create a local server as a repository server. This can be done by follwing steps:
  1. Create a user called "git".
  2. Create a bare repository as
        git init –bare
        Take a new directory as a place where we would create a repository for others. Run the command and the output will be as
        G:\BareGitRepo>git init --bare MyBareRepository
        Initialized empty Git repository in G:/BareGitRepo/MyBareRepository/
  3. Now add this bare repository to the repository as
          git remote add origin git@:
          The output will look like
            G:\MyFirstGit>git remote add origin2 git@localhost:MyBareRepository

Note: The bare repository server should run ssh at port 22. This was the error that I got.
G:\MyFirstGit>git push origin2 master
ssh: connect to host localhost port 22: Bad file number
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Note: While linking to this repository, I used the nick name "origin2" and not "origin". So do not get confused. It is not an error.
To read further about bare repositories, you could follow this link- http://git-scm.com/book/ch4-2.html.

If you see any problem, please do let me know so that I can update the article.
Please feel free to contact me or post comments.

Comments