0

I'm slightly embarrassed that I have to ask this question as I've been a developer for many years, but something surprised me at work today so I thought I'd clarify what is correct with the community.

I noticed that when we merge PRs into master, all previous commits added as part of that PR are added to master then a merge commit is also added.

For example, if I add a new PR with 1 single commit then merge it - I'm then seeing 2 commits on master, a single one that matches the commit in my PR then another with the exact same changes labelled as "Merge pull request #XXX".

This confused me as I thought what happened was a single merge commit was added with all of the changes from that PR included in that commit.

I'm aware you can squash and merge as well as rebase and merge, but I'm specifically talking about the default option in github when you merge a PR.

I've had a play around since and this looks like its just the default behaviour, but I'm sure I've never seen this before in my 20 years of professional development so I'm having a bit of a fuss over it.

Is this normal and I might be starting to lose my mind?

  • 3
    This sounds like the default behavior. I believe you can configure your github repo to squash merge by default but that isn't the norm as I'm aware – joshmeranda Jun 21 '22 at 17:14
  • "I'm then seeing 2 commits on master" Seeing how? What do you say or do, in order to see what commits are "on master"? (I ask in part because it may be that there's a misunderstanding of what the display is showing.) – matt Jun 21 '22 at 17:32
  • From the main repo page, I click the "X commits" from the right hand side, which directs me to https://github.com///commits/main – ClockworkDev Jun 21 '22 at 17:38
  • Commits are *snapshots*, not change sets. The merge commit represents the code after any merge conflicts are resolved, with both the former head of the target branch and the PR branch head as its parents. If you view a linearization of the merge, it may *look* like master consists of the old head of master, followed by your PR commit, and then the merge commit, but that's ignoring the fact that a Git repository is a directed acyclic graph, not a linked list, of commits. – chepner Jun 21 '22 at 17:38

2 Answers2

1

There are two separate things to consider here:

  • what Git does (or can do), and
  • what GitHub do (or can do).

When you run git merge from the command line, you have several options you can specify:

  • git merge with no options at all;
  • git merge --no-ff;
  • git merge --ff-only; or
  • git merge --squash.

(There are more options but they don't affect the set of commits you'll get.)

In all cases,1 the first step is to compute the merge base for the merge operation. The merge base is defined as the best shared commit: some commit that is on both "branches". I put the word "branches" in quotes here because git merge does not actually operate via branch names, but rather via the commit graph. So if you run git merge develop, for instance, you're not, in an important sense, merging the branch named develop but rather merging the commit found via the branch name develop. (This particular bit of Git is rather confusing, and shows us that when we say "branch", we're not always aware of what we're really saying. See also What exactly do we mean by "branch"?)

Typically, the merge base for a branch looks a bit like this:

          I--J   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

That is, we're on branch br1 (git status will say on branch br1), and the tip-most commit of br1 is commit J. Commit J has, as its parent, commit I, which has commit H as its parent, which has commit G as its parent, and so on. Meanwhile, the name br2 selects commit L, which has K as its parent; K has H as parent, and of course H and G and so on are exactly as already described.

Commit H, in this case, is the best shared commit so it is the merge base.

Or, we might have something that looks more like this:

...--G--H--I--J--N--O   <-- br1
         \     \
          K--L--M--P   <-- br2 (HEAD)

Here, we're "on" branch br2 using commit P. We run git merge br1 to select commit O to merge. The best shared commit is now commit J, which is on both branches because commit M—an earlier merge—makes this commit accessible from commit P: we walk from P to M, then across whichever parent number it is that leads to J (probably parent #2 but we can't prove that from this drawing).

In both cases, we see that the merge base commit is "behind" the tip commits involved in the merge. As a result, this merge must be done as a "true merge": Git must diff the snapshot in the merge base commit, whichever commit that is, against the snapshots in each of the two branch tips, and then combine the work done in the two branches (where, by "branches", we mean "set of commits starting just after the merge base and leading up to the tipmost commits").

If we run git merge --ff-only, Git will error out, saying that this merge cannot be done as a fast-forward.

But there's one other common case where merging is possible:

...--G--H   <-- br1 (HEAD)
         \
          I--J   <-- br2

Here, we run git merge br2 and Git finds that the merge base—the best shared commit—is once again commit H. But this time the current commit is the merge base. A full merge would require diffing the snapshot in H against the snapshot in H to see what we changed. This diff would, by definition, be empty: the snapshot in H matches the snapshot in H every time! So there's no work on "our" side to combine with the work done on "their" side, as seen in H-vs-J.

If we force Git to do a real merge using git merge --no-ff, Git will go ahead and do this full merge operation:

...--G--H------M   <-- br1 (HEAD)
         \    /
          I--J   <-- br2

This is what you see happening on GitHub.

From the command line, though, if we don't force a real merge, or if we forbid one with --no-ff, we get a fast-forward operation instead of a merge:

...--G--H
         \
          I--J   <-- br1 (HEAD), br2

That is, Git takes our current branch name br1 and "slides it forward" to point to the commit we told Git to merge. Git updates our checked-out commit, so that we now have commit J checked out, and updates its own index / staging area, and Git calls this a fast-forward merge even though there's no actual merging involved.2

It's worth pointing out one last case:

...--G--H   <-- br2
         \
          I--J   <-- br1 (HEAD)

Here, if we run git merge br2 Git just says that we're "already up to date". The merge base of H and J is H, and J is ahead of H, so there is nothing to do. We can't even convince Git to make a merge commit whose snapshot matches that in J.3


1Except, that is, the -s ours strategy, which skips the merge base computation. It doesn't need one so it just doesn't bother to find one.

2The "work" that git merge --ff-only does here is exactly the same as the work that git checkout does, except that the branch name—here, br1—is moved in a "fast-forward" fashion.

3Not, that is, using git merge. We could do it with git commit-tree and some other plumbing commands. Such a commit has little or no use though, which is why git merge won't do it, even with options.


--squash is different

The above covers most of the options, but not the --squash option. The reason for that is that git merge --squash doesn't make a merge. That is, it merges, but it does not make a merge. This distinction is between to merge, a verb that means to combine work, and a merge, which is a noun formed by "nouning" the adjective merge as found in a merge commit.

A merge commit—or "merge as a noun"—is any commit with two or more parents. The extra parents join up pieces of the Git commit graph, so that what starts as:

          I--J   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

can end up as:

          I--J
         /    \
...--G--H      M--N   <-- br1 (HEAD)
         \    /
          K--L

for instance. This keeps all the commits, but allows us to discard the name br2 once we're done with it. Merge commit M links backwards to both the br1 development line (along the top) and the br2 development (the K-L commit pair along the bottom).

When we use git merge --squash, we get the same snapshot that we'd get in merge commit M, but the new snapshot has just one parent:

          I--J--S   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

Commit S combines the H-I-J changes with the H-K-L changes, so it does the merge-as-a-verb part, but it has only a single parent J, so it it skips the merge-as-a-noun part.

Having made squash commit S, there's only one sensible thing to do with branch br2: we must throw it away, including both of the two commits on it. We delete the name br2. The commits themselves linger (for some time in a regular Git repository, forever on GitHub) and can be found by hash ID until they're swept away as trash (never, on Github) but we should not use them any more because they were superseded by squash-commit S.

What GitHub can do

When you use the web interface on GitHub, you're offered a green button, initially labeled MERGE.4 It has, however, a dropdown clicky segment where you can change it to read REBASE AND MERGE or SQUASH AND MERGE.

If you use it in the MERGE mode, it simply runs the equivalent of git merge --no-ff. That is, GitHub has Git do a full merge.5 You always get a merge commit, even if a fast-forward were possible.

If you use it in the SQUASH AND MERGE mode, it runs the equivalent of git merge --squash. That is, you get one new commit and you must6 kill off the merged-in branch.

If you use the REBASE AND MERGE mode, you get something for which there is no single Git command. Instead, GitHub will do the equivalent of a git rebase --force-rebase to forcibly copy all the rebased commits to new ones with a new committer and hence a new hash ID, then do the equivalent of a git merge --ff-only. So you won't get a merge commit, but because the commits that get added have new hash IDs, the person who made the PR must delete his/her/their branch just as for a squash merge. This happens even if the commits would have been fast-forward-able as they stand (which I find annoying, but I don't control GitHub).


4The owner / administrator of a GitHub repository can limit what can be done here, which may change the default, and I think GitHub use browser cookies to remember what you did last time to set your default. But if you haven't set anything up, this is the initial default.

5This is actually done with software that isn't (the C version of) Git itself, but it behaves the same way.

6Git and GitHub do not make you do this, and in theory there are some ways to keep that branch alive and useful, but in practice, you should just delete it.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Thank you for your in depth answer, this will take me time to digest but I do appreciate the detail you went into and hope it's helpful for others with a similar query in the future. – ClockworkDev Jun 22 '22 at 13:14
0

This confused me as I thought what happened was a single merge commit was added with all of the changes from that PR included in that commit.

This is the behavior of a squash commit that you are familiar with.

If it helps in not loosing your mind I would think of it like this. The commit on the branch has only one parent commit. All commits apart from the merge commit and the initial commit has exactly one parent commit.

When you make the commit on the branch git does not know if you are about to make more commits in the branch so it creates a regular commit with one parent commit. When you merge the two branches you need a commit that is different in that it has two parent commits. One in each branch in the merge.

TheIceBear
  • 2,912
  • 9
  • 23
  • Thanks very much for the explanation. Just surprised me that I've never acknowledged this before, maybe I've just always worked places where squashing is the default! – ClockworkDev Jun 21 '22 at 17:41