I didn’t give up however. With huge newly acquired respect to this tool (and Linus Torvald, who had written this vastly complex, for me at the time, mysterious thing), I delved deep into documentation, youtube and stack overflow articles.
That was about ten years ago. Since then I have learned to love the way Git works with every new piece of information about it, that had got settled in mind.
Now came the time, I am supposed to hand over my experience to my colleagues who are at the very start as I once was too. So I concocted the Powerpoint presentation still thinking about the things that have put me through all the mistakes in the beginnings.
This is the gist of my presentation and I think it can be useful also to other beginners out there. I got inspired by the official book about Git, that is available free at the official Git site (https://git-scm.com/book/en/v2
). I highly recommend reading this, if you really want to be confident in handling Git. Take this as a bare minimum to know at the starting point on the path for excellence.
Version Control Systems (VCS) in a nutshell
There exists two ways how you can handle the versioning. The version control systems (aka VCS) are by this perspective divided into two basic groups:
- Central VCS
- Distributed VCS (aka DVCS)
This approach is based on storing all the data on the server. Every time you do change, the system doesn’t take it into account, until it isn’t stored on the server, where the VCS is sitting.
This has its drawbacks though:
- As a single point of truth, once the server is down, no changes can be booked in anymore
- The resolution of conflicts between changes needs to be resolved on the server, which makes the workflow cumbersome
- If the central disks are destroyed, the whole team loses everything. Meaning nobody owns all the data anymore.
Distributed VCS – DVCS
DVCS is based on the principle, where the user stores all the data on local machine in the state, that is the same as one of the states stored on the central server. In other words, the whole history up to a certain point is contained on the users machine.
The users make changes on local machine. Once they have the work ready, they send them to the central server. If the user needs, he can update the clone of the central repository on his local machine to the latest state, that is booked on the server.
This resolves all the aforementioned drawbacks of the central VCS.
OK, so let’s start using the benefits of versioned codebase.
To install Git, go to the official page and follow the instructions there.http://git-scm.com/downloads
First time setup
- check the git is installed and ready
Easy so far, isn’t it? To check, you have the SW installed and ready to use, you should see the output for the
command in the command line of your system as in the following picture:
First time setup
- initial user identification setup
Now that we are sure, we have it installed properly, some initial setup should be done.
Run the following commands with info about yourself.
git config --global user.name “Username”
git config --global user.email email@example.com
Each commit (more on commits later) will contain this info. It’s almost necessary for collaboration to know, who is the author of individual changes.
These commands have added some info into global configuration of Git. You can see the setup of the Git configuration by running the following command
git config --list
As a result, you should see something like this
First time setup
- git help
Git contains detailed documentation. Each time, you are not certain about some command, you can run the following command to see the documentation about it.
git <command> --help
Hosting repository on the public servers
There are some well known places dedicated to seeding Git repositories. One of them is Github.
You can create account there and use it for management of your own repositories. I won’t elaborate more about this as this tutorial is meant to provide only the basics of Git command line tool.
For more info see the Github sitehttps://github.com/
Cloning the repository
Here is a visual representation of public repository page hosted on Github
To clone repository from Github, you need to got to https://github.com and follow these steps.
- go to particular repo (e.g. https://github.com/PetrBorak/git-for-db-devs)
- find the clone icon (The green button on the screenshot)
- copy the url to clipboard by clicking on the icon next to the input with the URL
- run the git clone <past the here url from clipboard> command from command line on your local machine
for example, let's say, I have in clipboard the URL from the above repository. The command would look like this:
git clone https://github.com/PetrBorak/git-for-db-devs.git
The output of git clone in your IDE or command line tool of your choice should be similar to the screenshot below
You can see a new folder appeared in the filesystem. This new folder contains, among others, also the .git file. This is where the Git stores its information about the newly created local repo.
Checking the state of the repository
Once you have the repository cloned, you should be able to check its state on your local machine.
Go to the newly created folder and run the following command:
The result should be similar to the following image:
You can see, the output says, you are on the branch master and you have clean working directory. That means, the local state of your repository is up-to-date and aligned with the last snapshot of the remote repository hosted on the server.
Beyond .git file, you can see, you have two additional files in the filesystem:
- .gitignore file
- test.txt file
These files came from the remote repository as a result of the clone.
The workflow with Git
The state of the file, viewed by Git, switches between following states:
- modified – You have modified files in working directory, that won’t go into next commit
- staged – you have files staged in the staging area and prepared to be part of your next commit
- commited – files are commited and included in the commit snapshot in the .git directory
This is the schema of the flow
Adding files to staging area
Open the test.txt file, make some changes and save it again. Now run the
command again. You should see output in the console as in the following picture:
Now the console says you have some files not staged for commit. That means, according to the previous schema, you have changes in the working directory, that are not in the Gits staging area.
To add the files to the staging area, you run the following command:
git add test.txt
to add individual file to the staging area, or:
git add -A
to add all changed files to the staging area.
If run one of that commands in the working directory and then
again, you should see the following output
The “Changes to be committed” says, you have changes in your working directory, that are not stored in the local repository, but are ready to be added in the next commit. In other words, the changed file is stored in the staging area. The process of adding the changes to local Git repository is called committing.
To add files from staging area to local repository, you need to run the following command:
git commit -m “my first commit”
the -m parameter followed by quotas and some message is part of the commit and it’s the message, other will see, once they pull your changes. It serves as some kind of explanation of the reason, why the commit has taken place.
If you run the
command again, you should be given the following output.
This time the console says “Your branch is ahead of origin/master by 1 commit”
This means, you have changes stored in your repository, that are not yet stored on the server repository. Thus you are one commit ahead.
You can imagine the commits as linked list holding the record of changes stored in the Git repository. Each time you run
you create one more record. The result is this schema in the Gits internal representation.
As you can see, each of them contains a few pieces of information:
- The tree – the links to the files in particular states. Git works as an autonomous filesystem. The .git folder contains all the files and their stored states. They form a tree structure. The tree structure can be indetified by an unique hash. This hash is stored in the commit as the tree property, as you can see in the schema above.
- Parent – Hash to the parent commit. Commits are simple linked list in the Git internal representation. There are some cases, when the commit has two parents however (but I won’t mention that in this article for the sake of simplicity)
- Info about author and committer – This is the info we set up in the previous steps by running the git config –global <options> earlier.
A branch in Git is simply a lightweight movable pointer to one of the commits. You have the list of commits mentioned above. The branch can be imagined as a record containing link to one of the commits.
Here we have a commits list – the blue rectangles - and the branches pointing to individual commits – the green rectangles. On top of that, git uses a HEAD. Head is internal pointer to the one of the commits or to a particular branch.
Working with a branch
To create a branch, that will be the pointer to the recently checked commit, we have the following command:
git branch <name-of-the-branch>
And to switch to this branch, Git uses operation called checkout:
git checkout <name-of-the-branch>
To create a branch and check it out straight away at one go:
git checkout -b <name-of-the-branch>
After running the last command, the console gives us the following output:
Now we are in the newly created branch. To elaborate a little bit more about that, we created a branch, which is a pointer to the commit pointed by the HEAD pointer and this branch will update as we add new commits to point to the latest one.
You have done changes to your files locally and stored those changes in your locally stored repository – you did a commit. At this moment, the repository on your local disc differs from the state of the remote repository. You want everybody to see your changes. You want to make them public.
So you need to do PUSH
git push --set-upstream origin master
the –set-upstream parameter is followed by <remote> and <name-of-the-branch>.
It says to git that the branch you are in (in this case master) should track the branch on the remote repository called master.
Once you run this command, the changes on your local repository are projected to the remote repository. You should see something similar to this in your console:
Git counted the objects, compressed them and wrote them to the remote repository.
Viewing information about the remote repository
Git stores two types of branches on your local repository
- The remote branches from remote repository
- The branches locally created in your environment, that track the remote branches
To learn what branches you have created locally, run the following command
git branch -a
In our local repository, you should see a result like this:
The green color for the branch says us, the branch is local (meaning in our local repository). The asterisk before the branch "master" says us, this is the branch we have selected at the moment.
The second line shows us, we have the remote head pointer pointing at remote master branch. And the third line lists the remote branch that tracks the remote master branch.
Git internally works against remotes that are, in the most simple scenario, defined during the git clone command.
The remote keeps information about the address of the remote repository, to which we store our changes by git push.
To see the information about the remote, run
git remote -v
This is the result for our current repository:
This output says us, we have remote named origin and the endpoints, it uses for fetch and push operations.
Pulling from repository
We already mentioned the push operation. But we also need some way to bring the changes from remote repository to our local one.
This is, what the following commands do
The "git fetch" command is a simplistic operation of Git. It checks the remote repository and updates the remote branches, we mentioned before.
The "git pull" can be seen as kind of extension of the "git fetch". It has the following flow:
- Do the fetch as "git fetch" does
- Find the tracking branch for the fetched remote branch
- Merge the remote branch into the tracking branch
Once we are all set up and using our git repository with confidence handing it by the git commands, we will also need to project the changes stored in one branch into another - destination - branch.
This is called merging.
Let me explain that on the simplest scenario
Merging branches – fast forward
Let's assume, we have a state of our branches as in the following scheme
We check out the master branch and want to include the changes in the hotfix branch there.
The commands, we will use are
git checkout master
git merge hotfix
We are already familiar with git checkout command. To sum up, this command will switch us to the master branch.
The next command – "git merge hotfix" - takes the changes from hotfix branch and projects them to the branch, we are actually in – the "master" branch.
If you look at this situation in the scheme, all Git needs to do, is to move the pointer of the "master" branch to the right – one step ahead – to the commit of the "hotfix" branch. This way the "master" branch will acquire all the changes, that are present in the "hotfix" branch and the merge is finished.
This type of merge is called "fast forward". We simply move the pointer to the child commit and the merge is done.
Pretty simple, and neat, isn’t it?
The state of the repo can then be visualized by the following schema:
Merging branches – three way merge
Now imagine we have a little bit different situation. The next scheme shows the new situation.
You can see, the "master" branch is not simply one step behind the "feature" branch as in the previous situation.
In this scenario, we need to take the changes from "commit 04", which are not included in the masters "commit 05" and its antecedents and figure out, how to project these changes to the "master" branch.
Git approaches this situation by doing so called "Three Way Merge".
It will determine, which is the first common commit in the tree for both the "feature" branch and the "master" branch. Then it will automagically count the changes from this commit forward towards each of this branches. After that, it will introduce a new commit following the "master" branch with the result of the merge and move the "master" branch pointer to this commit.
The process of this three way merge can be seen in the following scheme
And this is the result
At this point, supposing you had a total clean slate regarding Git, you should be able to handle the basic scenarios. To be said, most of the time, you should be OK with these steps described here. But there will be moments, when deeper knowledge will be required.
All the same, now, as I hope, you should not be afraid of more complex situations. As you have already an idea of how Git does its magic and all other scenarios can be described with the mental model acquired from this article.