0

When two people work on the same branch and commit several times without pushing, before the one who pushes their commits last, they have to either git pull --rebase or merge (git fetch/git merge) first in order to get the changes in the repository.

When they merge the changes together, when viewing the source tree with e.g. gitk, you see the commit in something that looks like a different "branch". How does git differenciate between these "auto-generated branches" and branches that you created by checking them out? Can you name those merge branches after they were created?

Edit:

What I mean by "auto-generated branch":

auto-generated branch

What I mean by "named branch":

named branch

In my understanding a branch, visually, is when the history graph forks into two. The fork of the code base in the first image was created by the scenario I talked about in my initial question, the second one was created by git checkout -b v2.0

Tom M
  • 2,815
  • 2
  • 20
  • 47
  • 2
    Can you show an example (maybe a screenshot of gitk)? I don't understand what you are asking. – mkrieger1 Jan 25 '18 at 14:52
  • @mkrieger1 Updated the question with images and an explanation – Tom M Jan 25 '18 at 16:11
  • A branch in git is a label that refers to a commit, nothing more. – Lasse V. Karlsen Jan 25 '18 at 16:13
  • 3
    There is nothing "auto generated" here. Each dot represents a commit which a human made. – Code-Apprentice Jan 25 '18 at 16:15
  • A Git branch is just a pointer to a commit. Tags are also pointers to commits. Branches are movable, tags are meant to be static. [`git checkout`](https://git-scm.com/docs/git-checkout) may create branches (as pointers to commits) when invoked with the correct arguments, but it never forks the graph. – axiac Jan 25 '18 at 16:19
  • @axiac I am not talking about the difference between tags and branches. I am talking about the parts where the history after a commit splits. – Tom M Jan 25 '18 at 16:21
  • 2
    @TomM the part after a commit where the history splits is where two people committed different changes on top of the same common commit. Their versions of the project diverge in such points; they converge back to a common version on merges. Git doesn't create branches by itself, neither pointers to commits, nor history forks. – axiac Jan 25 '18 at 16:23
  • 1
    *"How does git differenciate between these "auto-generated branches" and branches that you created by checking them out?"* -- in fact one never generates history forks by checking out a branch/tag/commit. They generate the forks when they commit. – axiac Jan 25 '18 at 16:25

2 Answers2

2

TL;DR

The branchy structure you are seeing typically comes about without using a lot of separate names. To avoid it, you would rebase rather than merging. (It's not a question of names, really, although there's a tie-in between names—Git's branch names and its remote-tracking names, and for that matter tag names, which are all specific forms of what Git calls a reference—and what you want here.)

Git is all about commits

The first and most important thing to remember is that Git is really all about commits. The commit is Git's raison d'être. A commit does several things at once, by storing both data—specifically, a complete snapshot of a source tree—and metadata. Let's look at an actual commit:

$ git cat-file -p HEAD | sed 's/@/ /'
tree 982dc557269a91826c64dd7e3c7d63c4ccfefc90
parent 8c8ddbd0821d552ff3c7e1b67c669dd7f11d63d7
author Junio C Hamano <gitster pobox.com> 1515188717 -0800
committer Junio C Hamano <gitster pobox.com> 1515188717 -0800

Git 2.16-rc1

Signed-off-by: Junio C Hamano <gitster pobox.com>

The tree line here gives the hash ID of the snapshot (Git stores everything by hash ID, which we'll come back to in just a moment). The parent line gives the hash ID of the parent commit. The author and committer lines tell you who wrote the commit (author) and who put it into the repository (committer), along with a time stamp (Unix-format seconds-since-epoch plus a time zone offset). The rest of the metadata is the commit message.

Every Git object—there are four types: commit, tree, blob (basically the internal form of a file), and annotated tag—has a unique hash ID. This hash ID is the "true name" of the object, and is how Git stores it in the repository database, and looks it up. Hence, to find a commit, Git needs one of these hash IDs.

The problem with these hash IDs is that they're quite useless to humans. So Git adds names, like branch names, tag names, and remote-tracking names. All of them have one primary job: the name remembers a hash ID. That way, instead of trying to remember 36438dc19dd2a305dddebd44bf7a65f1a220075b, I can just remember master (a branch name) or v2.16.0-rc1 (an annotated tag name—I'm going to skip some details here).

There are some important facts about commits, or in fact any Git object:

  • The hash ID is always unique. (See How does the newly found sha1 collision affect git?; follow some of the links for more discussion.)
  • The hash ID is determined strictly from the contents of the object. If you change anything about the the object and store it back into the repository database, you get a new, different, unique hash ID.

What this means is that you cannot change anything about any commit. But you can copy objects to new (and different) objects. We also know that each commit lists its parent commits, as we see in the commit listed above. We'll make use of this idea soon. We also need one more item: a merge commit is a commit with two parents (or more, but we won't bother looking at this case).

Commits with their parent IDs form a Directed Acyclic Graph or DAG

In mathematical terms, a graph is a set of vertices V and edges E that connect those vertices. (See the diagrams on the Wikipedia page for examples.) In a directed graph, the edges have a direction: like one-way streets in a city, you can only travel one direction along the edge, from vertex (or in our case, Git commit) to vertex. These edges are called arcs to remind us that they're one way. (I'm not going to cover the acyclic part of DAG here even though it's important in a theory sense. It occurs naturally in Git, without any action on our part, and we don't have to care much.)

In our case, the direction is always backwards: from a later child commit, we can follow the arc leading to its parent. If we draw a chain of commits, we get something like this:

... <-o <-o <-o ...

where "newer" or "later" commits are towards the right. Each child commit points back to its parent, by virtue of storing the parent hash. Since the arrows always point backwards (left-ish), we can stop drawing them, which is good since there are not that many good arrow fonts to use on StackOverflow. :-)

Note that if we start a new chain with a child pointing back to an earlier parent, we get a divergence:

...--o--o--o--...
         \
          o--o--...

If a commit is a merge commit it has two arcs, leading back to both parents. It's these merge commits that rejoin these chains:

...--o--o--o--o---M
         \       /
          o--o--o

This last commit, marked M, is a merge commit with two parents.

Now, this—making new commits in general, and making merge commits in particular—is where branch and other names come in ... or, sometimes, don't!

Finding commits at all: we need names, aka references

We mentioned above that hash IDs (36438dc...) are useless to humans; we like names. So we have a name like master for finding this ID. Git needs them too, at least some of the time. In particular, while Git can rummage through the entire database and find every object—and it has maintenance commands like git fsck and git gc that literally do that—this is very slow in a big repository. The fast operation is to take a known hash ID and find the data that go with that ID.

So, in general, we have Git start with names like master, which find a commit like 36438dc.... Git can show us that commit, or check it out, or whatever, using its information, especially its tree line and sometimes its parent. Or we can have Git step back one commit in history, to 36438dc...'s parent, which is of course another commit. We can have Git extract that commit, or look at its parent, and so on.

Whatever we do, though, it's the names that get this process started. The names identify one specific commit, from which we (or Git) can work backwards.

How branch names behave when we make new commits

When humans go to add commits to a Git graph, we do it by doing:

$ git clone <url>   # at least for the first time
$ cd <repostiory>   # as necessary
... do some work ...
$ git add file1 file2 ...   # or git add -u, etc
$ git commit

The git add step copies files from the work-tree, where we did our work on them, into the index / staging-area, replacing the previous version that was in the index / staging-area. The git commit step makes the new commit from whatever is now in the index: all the old files the way they were before, and the changed files the way they are now that we copied them into the index with git add.

When git commit makes a new commit, it goes through the following steps (not necessarily in this order):

  1. Turn the index (all the files that go into the snapshot) into a tree object.
  2. Find the current commit's hash ID. That will be the new commit's parent.
  3. Collect up our name and email address and time-stamp for the author and committer lines.
  4. Collect up a commit message.
  5. Write all of these out as a new commit object. Git assigns the new object its new, unique hash ID.
  6. Write the new hash ID into the current branch.

Let's look closely at steps 2 and 6. If we did a git checkout master, so that we are (as git status puts it) on branch master, the current commit's hash ID is the one stored under the name master. That's where we get the hash ID for step 2. In step 6, we replace the hash ID with the new commit we just created.

In other words, when we make a branch "grow" by using git checkout and eventually git commit, we're telling Git to make new, permanent, read-only commit objects whose parent is what the branch tip was before the commit, and to update the name to point to the new commit:

...--o--o--*   <-- master

becomes:

...--o--o--*--@   <-- master

The name master, in effect, moves to point to the latest commit.

But Git is distributed

While we're making our changes and committing them, other people have done git clone of the same origin repository, and have been adding their own commits. Your own repository (Tom's repo) might have this now:

             @   <-- master
            /
...--o--o--*   <-- origin/master

Here, we're using the name origin/master—a remote-tracking name rather than a branch name—to remember where master was in the Git repository on origin. The branch name master is "Tom's master", not "origin's master". They were originally both pointing to the same commit, the one I marked *, but since then, you made a new commit, with a new, unique ID.

Meanwhile, though, Sharon also did a git clone and has been working:

...--o--o--*   <-- master, origin/master

and now she makes a new commit in her repository, which gets a new, unique ID, different from your new and unique ID:

...--o--o--*   <-- origin/master
            \
             ●   <-- master

If we were to somehow combine Sharon's repository and your repository, let's see what we'd get. Remember, each commit is uniquely identified by its hash ID, so the three middle-row commits are the same in your repository and in Sharon's:

             @   <-- (Tom's master)
            /
...--o--o--*   <-- origin/master
            \
             ●   <-- (Sharon's master)

This forking / branching behavior has already occurred, even if we haven't combined your repository and Sharon's yet. It's occurred in a sort of virtual sense: it will be there once we do the combining.

So, let's say Sharon now runs git push origin master. Her Git will call up the third Git at origin and send it her commit . Her Git will then ask origin's Git to set origin's master to point to . If all goes well, which it probably does, origin's Git now has:

...--o--o--*--●   <-- master

When your Git calls up origin and downloads new commits, your Git gets commit , which is new to your Git. Your Git remembers where origin's master is by updating your own origin/master, giving you:

             @   <-- master
            /
...--o--o--*--●   <-- origin/master

This is the same diagram we drew before—the only difference is that we drew commit on the same row, instead of a lower-down row.

It's now your job, since this has landed in your repository (not Sharon's), to do something about this. If you just naively run git merge you will merge your commit @ with Sharon's :

             @--M   <-- master
            /  /
...--o--o--*--●   <-- origin/master

This merge gets added just like any ordinary commit, except that instead of one parent ... line, it has two: one for your @ commit and one for Sharon's .

If you instead use git rebase, you will copy your @ commit to a new-and-improved commit. The difference between your original @ and the new one is that your new one will build up from Sharon's, so its parent will be :

             @
            /
...--o--o--*--●   <-- origin/master
               \
                ○   <-- master

Over time, using git rebase instead of git merge will give you a linear structure rather than a branchy one.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Hell, thank you so much for writing all this down. I just realized how broad and imprecise my question was. You helped me out a lot anyways! – Tom M Jan 25 '18 at 21:39
  • After reading your answer, I have one more question - as a branch is basically a reference/name to a specific commit, how would you, in gitspeak, call two commits pointing to the same parent commit? – Tom M Jan 25 '18 at 21:45
  • 1
    If you mean you have commit aaaauglyhash1 and commit bbbbuglyhash2, both of which have as parent commit cccchash3? I don't think there's a specific word for that. – torek Jan 25 '18 at 22:07
1

Nothing is "auto-generated" in the git history that is illustrated in gitk. Each dot represents a commit which a human created. Each line between two dots shows the relationship between commits. If a line goes upwards from one commit to another, the upper commit is a child of the lower commit and the lower commit is a parent of the upper commit.

Note that you can create tags and branches which point to any commit that you wish. If you want a static marker at a particular commit, just create a tag with git tag.

In my understanding a branch, visually, is when the history graph forks into two.

A branch is simply a pointer to a commit. When you checkout a branch and then make another commit the branch moves to the new commit. This means that creating two branches at the same commit and committing two different sets of changes causes the forking that you see. Here is an example.

Say you have a commit history which looks like

A-B <- branchA, branchB

and you do

$ git checkout branchA

# make some changes
$ git commit -am 'Changes on branchA'

Now your history looks like this:

A-B  <- branchB
   \
    C <- branchA

Then you do

$ git checkout branchB

# make some changes
$ git commit -am 'Changes on branchB'

Now the history looks like this:

A-B-D  <- branchB
   \
    C <- branchA

Note the "fork" at commit B. This was caused by the two commits on two different branches. There is nothing automatic here. The history reflects the actions of the human programmers.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268