3

As someone relatively new to Git, I have recently (and finally!) understood that a branch is actually just a pointer to a particular commit, and that sometimes it might be better to rephrase "which branch a commit belongs to" as "from which branches is a commit reachable".

For example, the following diagram is from the official Git documentation:

git tree with two branches: master and iss53

In this image, I would intuitively think that commit C4 "belongs" to the branch master and commits C3 and C5 belong to iss53. But what about C0 through C2? Would they belong to both branches? Or must I say they are "reachable" by branches master and iss53?

This gets more complicated once I merge iss53 into master:

git tree with branch iss53 merged into master

Since branch iss53 was merged into master, does that make commits C0 through C2 belong to master "more" than iss53?

What if I delete branch iss53 after the merge? Which branch would commits C3 and C5 belong to? After thinking about it more, it seems that after the merge, commits C4, C3, and C5 are "equal" in terms of the branching history and I can't tell which branch the three of them belong to. This is because after deleting iss53, there doesn't seem to be any information as to whether C4 belonged to any historical branch any more than C3 and C5.

I have found this answer which says that it is better to think about this in terms of "from which branches can this commit be reached". But does that mean C4, C3, and C5 are all reachable from the master branch??? But how do you handle the branching parentage that happens in the diagram? Does that matter?

Also, the answer I linked to stated that there could cases where a commit cannot be reached by any branch, how can that happen? And what are its implications?

But my main question remains: How do I associate commits with branches?

P.S. A side/off-topic question that stems from this post would be: Can a commit have more than two parents?

hpy
  • 1,989
  • 7
  • 26
  • 56
  • 2
    Yes a commit can have more than two parents. – evolutionxbox Jul 28 '20 at 18:24
  • 3
    As written, I'm tempted to call this "unclear what you're asking". You are asking a lot of clarification questions, all of which are different than your title question.The question you link to seems to accurately answer it. I think the confusion you're having is that in your last diagram, all of the commits are in master (or are reachable by master). Also, take a look at octopus merge for more than 2 parents. – TTT Jul 28 '20 at 18:26
  • @TTT: Thank you for your critique. I must admit learning about branches is a confusing process for me and I'm sorry if my question wasn't clear. I *think* what I am trying to ask is how do you think about the relationship between commits and branches under the myriad of situations I mentioned in the post? I want to be more specific but I confess it is hard for me. If you can suggest a better way to do this I am all ears. – hpy Jul 28 '20 at 23:24
  • 1
    That makes sense. Understanding the DAG takes some getting used to (see torek's answer). Note your edit asks how an orphaned commit is possible, and it is as Greg Burghardt describes- e.g. if a commit is only reachable by one branch and that branch is deleted, the commit is orphaned and will eventually get garbage collected, unless something else points to it (like a tag or reflog entry- back to torek's answer). – TTT Jul 29 '20 at 13:37
  • 1
    And BTW, to answer your title question, it's important to realize that the wording should actually be, "How to tell which **branches** a commit belongs to?" More info here:https://stackoverflow.com/q/2706797/184546 – TTT Jul 29 '20 at 13:39

2 Answers2

9

Commits do not belong to a branch. There is no ownership. A branch is a pointer to a commit. Each commit has one or more parent commits. Tracing back through the history of a branch does not just involve a straight line when multiple branches are merged together. You'll need to reorient your view of commits and branches.

Commits exist in many branches.

Commits can also exist in no branches at all.

Conceptually a Git repository is just a big linked list, where each node points back to at least one other node. A "branch" is just a marker pointing to one of the nodes. A node in Git is called a commit. Deleting a branch in Git just deletes the pointer to the commit, but does not delete the commit object itself. You can recover branches you accidentally deleted, because the database of commits is arranged as a linked list, and a branch is just a pointer — a bookmark, if you will.

But does that mean C4, C3, and C5 are all reachable from the master branch?

Yes, that is precisely what it means. All of those commits are reachable, because commit C6 points to 2 different commits: C5 and C4.

how do you handle the branching parentage that happens in the diagram? Does that matter?

Commit C6 has two parents. This means two branches were merged together. That's how you handle the "branching parentage." Commits with more than one parent were creating with a git merge or git pull (which is a git fetch followed by a git merge).

Greg Burghardt
  • 17,900
  • 9
  • 49
  • 92
3

To add to Greg Burghardt's answer, reachability is indeed the key concept here. The commit graph, complete with the hash IDs and arrows, is the be-all and end-all, as it were. The branch names just give you—and Git—an easy entry point into the graph (but see git gc in the next paragraph).

The commit graph takes the form of a Directed Acyclic Graph or DAG. The system as a whole requires that a commit be reachable from some external name—a branch name will do, but so will a tag name, or even a Git reflog entry—to keep the commit "live". The maintenance program git gc will, when asked, scour through the entire commit database, finding commits that are not reachable from any external name, and prune them from the graph. Commits that are reachable from a name, or from a commit that itself is reachable from a name, remain in the graph. Commands that add new commits to the graph often end by running git gc --auto, which tells git gc to poke around a bit, guess whether this kind of maintenance is wise at this time, and if so, do a maintenance run.

Other parts of Git will do a graph walk whenever necessary and appropriate. The git log command, for instance, does one, starting from some given commit(s) and working with the DAG. The graph walk uses a queue (as many graph-walking algorithms do) and keep track of visited commits, so that it can visit each commit once, even if there are multiple ways to get to it.

torek
  • 448,244
  • 59
  • 642
  • 775