Difference between merging develop into master and master into develop?

Question

Difference between merging develop into master and master into develop? I tried to merge develop into master and it gave me 34 files and 1,312 additions and 324 removals and I tried to merge master into develop and it gave me 251 files and 87,123 additions and 1,321 removals. My guess is that it takes the time it spun off from master and then take all the changes and compared it with the files changed on that branch from the files in the branch we want to merge into? Am I correct?

It means that for both branches to be the same, we need to merge master into develop and then merge develop into master every time when both branches were being changed on a daily basis for 1 month + by a dozen of developers?

What does git-diff give us? Does it give us all the differences from both branches or what we would get if we tried to merge branch 1 into branch 2?

score 0 · Accepted Answer · answered Apr 13 '22 at 05:54

To understand the answer to that question, let's start with some facts about Git:

Git stores commits, rather than files or branches. The commits are the history in a repository: each one has a unique number (a hash ID or object ID aka OID), and each one holds two things: a full snapshot of every file, plus some metadata. The metadata in any one commit includes a list of previous commit hash IDs, which lets Git relate the later commits back to earlier ones. Most (all "ordinary") commits have just one previous hash ID in them, which links the commit to its parent. This allows Git to work backwards, from the latest commit to the earliest.
Branch names like master or main, develop, br1, feature/tall, etc., just contain one commit hash ID. By definition, whatever hash ID is stored in the name is the latest commit on that branch.

From these two facts alone we can start to visualize commits:

... <-F <-G <-H   <--br1

Here we have a branch name like br1 that selects, or points to, the last commit that is on that branch. That commit has some hash ID that we'll just call H so that we don't have to generate some random looking thing and try to remember it.

Commit H holds a snapshot of all files, but also holds metadata to say who made commit H, why (their log message), and so on. The metadata for commit H stores the hash ID of one previous commit, which we'll just call G. So commit H points backwards to earlier commit G.

Commit G, being a commit, stores a full snapshot, and metadata. The metadata in G make G point backwards to earlier commit F, which is also a commit, so it points backwards to another earlier commit.

When we look at a commit with git diff or git show, we're actually giving Git two commit hash IDs. We start with the commit itself, such as H, maybe using the branch name br1:

git show br1

Git uses that to locate H, then uses H to locate its parent commit G. Git then extracts, to a temporary area in memory, both snapshots, and compares them. We are, after all, interested only in the files that changed. (This is assisted by the fact that commit snapshots de-duplicate file contents, so if H and G mostly share most of the files, Git can tell that instantly and not even bother extracting those files.)

For files that did change, Git figures out a "change recipe"—a diff—and shows that as "what happened". This works great for ordinary commits like commit H. But it breaks down with merges.

Merges

To understand git merge, we start with the goal of a merge: to combine work. Let's draw a picture of some commits where there's been a fork of some sort, so that we have two different chains of work, like this:

          I--J   <-- br1
         /
...--G--H
         \
          K--L   <-- br2

That is, there are at this point two "latest" commits. Commit J is latest; and its parent is I, whose parent is H, whose parent is G and so on. But commit L is also latest; its parent is K, whose parent is H, whose parent is G and so on. Here J is the latest br1 commit and L is the latest br2 commit.

As always, every commit holds a full snapshot of all files. To combine work on both branches, we need to find changes. How do we do that? Well, we already know an answer: we can use git diff or git show to pick two commits and compare them.

The thing everyone tries first—which doesn't work—is to pick commits J and L and compare them. But that shows what's different between these two "latest"s, which usually isn't what we want. For instance, maybe Alice made br1 by fixing a typo in the README and adding feature 1, and Bob made br2 by fixing a different typo in some other documentation and adding feature 2. If we diff J vs L, the recipe Git will give us is: remove the fix and feature from Alice, and add the fix and feature from Bob. What we want is add the fix and feature from Alice, and also add the fix and feature from Bob.

To cut to the chase, the trick here is to start from commit H. That's the best shared commit: a commit that is literally on both branches. By starting at J and working backwards and also starting at L and working backwards, we find that H is the best shared commit. So we diff H vs J to see what Alice did, and then—separately!—diff H vs L to see what Bob did. That gets us the sets of changes to combine.

Git will then do its best to combine these change-recipes, using some very simple rules:

If nobody touched a file, use any version of it: all three are the same.
If one branch touches some file and the other branch doesn't touch it at all, use the changed file from the one branch that changed it.
If both touched some file, try to combine the changes, line-by-line. If they're on different, non-overlapping lines, they can be combined. Git adds the rule that the lines must not abut either. If they do overlap, they must be 100% identical. Otherwise, Git will declare a merge conflict and make you, the programmer, clean up the mess.

These rules work surprisingly often, so that git merge can get both sets of changes out of the two branches, apply the combined changes to the snapshot in H, and use that for a new snapshot. Depending on how you like to view this, the result is that we keep the br1 changes while adding the br2 changes, or we keep the br2 changes while adding the br1 changes. Note that, like ordinary mathematical addition, the result is the same regardless of the order of the addends (that is, we don't need to define a separate "augend" vs "addend" because the operation is commutative).¹

Having come up with a snapshot for the new commit, Git then makes the new commit. You supply a log message in which you explain why you did the merge—or you use the crappy default message, merge branch br2 for instance, which is what most people really do—and Git makes a new commit that is just like any commit: it has a snapshot and metadata. What makes the new commit special is that instead of just one parent, it has two:

          I--J
         /    \
...--G--H      M
         \    /
          K--L

Note that I have filed the branch names off this picture. Whenever you make any new commit—whether with git commit, or git merge, or git cherry-pick or git revert or whatever—Git will update the current branch name automatically for you, so that M is now the latest commit. But which branch name gets updated? Well, that depends on which branch name git status said you were on:

$ git status
On branch `br1`
...
$ git merge br2

results in:

          I--J
         /    \
...--G--H      M   <-- br1 (HEAD)
         \    /
          K--L   <-- br2

That is, you were "on" br1—that's what the HEAD attached to br1 means here—and you still are, so the new commit is also the latest for br1. But if git status said that you were On branch br2 and you had run git merge br1, it would be the name br2 that is updated.

¹Note that adding options to git merge, such as -s ours or -X theirs for instance, changes this: the operation is no longer commutative.

This answers the first part of your question

[What is the d]ifference between merging develop into master and master into develop?

One will advance the name master, and the other will advance the name develop. That is, you'll have either:

       o--...--o
      /         \
...--o           M   <-- master (HEAD)
      \         /
       o--...--o   <-- develop

or:

       o--...--o   <-- master (HEAD)
      /         \
...--o           M   <-- develop (HEAD)
      \         /
       o--...--o

when you're done. The snapshot in M will be the same either way.

The rest is either unimportant or crucial

Since the commits are what matter, the name you use to access the result of the merge is not very important. Well, except for one thing: if you keep using that name to make more commits, and want to access these "more commits", the name you had Git advance is absolutely critical.

That is:

       o--...--o
      /         \
...--o           M--N   <-- master (HEAD)
      \         /
       o--...--o   <-- develop

and:

       o--...--o
      /         \
...--o           M   <-- master (HEAD)
      \         /
       o--...--o--N   <-- develop

are entirely different pictures. That's because we make a new commit by starting with the snapshot from some existing commit, changing a few things, and committing the result. So what's in N's snapshot will depend on whether you started from commit M—the merge result—or not.

But note that we can rename the two branches any time:

       o--...--o
      /         \
...--o           M   <-- smorgas (HEAD)
      \         /
       o--...--o   <-- bord

It's not the names that matter, it's the commits. So you should arrange the commits the way you want, and use names that help you achieve your goals. To do that properly, you'll need a good mental grasp of what your goals actually are in the first place. All that matters to Git are the commits; the branch names are just ways that Git helps you find the right OIDs, so that you can find the commits you care about.

The rest of it

I tried to merge develop into master and it gave me 34 files and 1,312 additions and 324 removals and I tried to merge master into develop and it gave me 251 files and 87,123 additions and 1,321 removals.

Any time Git says "A files added, R files removed, C files changed" (plus the number of lines added or removed to/from the changed files), Git has run git diff. When we have this:

...--F--G--H--I--J

and we're using commit H with git show, it's simple enough here: Git is comparing commit G, the parent of H, with commit H, to get that set of numbers. But when we have:

          I--J
         /    \
...--G--H      M   <-- br1 (HEAD)
         \    /
          K--L   <-- br2

and Git is trying to tell us something about commit M, well, M has two parents, not just one. So which one did Git use in the git diff it ran to come up with the numbers?

The answer here is that Git uses the first parent of the merge. When you run:

git switch somebranch
git merge otherbranch

and get a new merge commit that goes on somebranch, its first parent is the commit that was at the tip of somebranch just a moment ago, and its second parent is the commit that is (still) at the tip of otherbranch. So you'll get different numbers here if we switch to br2 and merge br1. If we add the parent numbers to the diagram, we can show how this works visually:

          I--J
         /    \₁
...--G--H      M   <-- br1 (HEAD)
         \    /²
          K--L   <-- br2

vs:

          I--J   <-- br1
         /    \²
...--G--H      M   <-- br2 (HEAD)
         \    /₁
          K--L

So you'll see different statistics even though the snapshot in M is the same either way, because Git is comparing M vs either J or vs L.

It means that for both branches to be the same, we need to merge master into develop and then merge develop into master every time ...

You can do this, but watch out! Not every git merge actually makes a merge commit. If you force git merge to make a merge commit, and you start with:

          I--J
         /    \
...--G--H      M   <-- br1 (HEAD)
         \    /
          K--L   <-- br2

and run git switch br2 && git merge --no-ff br1, you get this:

          I--J
         /    \
...--G--H      M   <-- br1
         \    / \
          K--L---N   <-- br2 (HEAD)

where the first parent of N is L and the second parent of N is M. Commits M and N will (generally²) have the same snapshot, but they have different numbers and are different commits.

But git merge doesn't always make a merge commit. Consider the following branch structure:

...--G--H   <-- main (HEAD)
         \
          I--J   <-- feature

Here we used two commits to develop a new feature. We're now ready to merge it. If we run git merge feature, a true merge would have to:

find the merge base: that's commit H;
diff the snapshot in H against our current snapshot H to see what we changed, but that's obviously nothing at all because H is H;
diff the snapshot in H against their snapshot in J to see what they changed;
add the non-changed stuff in ours to the changed stuff in theirs and apply those changes to what's in H, giving their snapshot; and
make a new merge commit.

The result looks like this:

...--G--H------M   <-- main (HEAD)
         \    /
          I--J   <-- feature

and the snapshot in M matches that in J. But Git takes a shortcut. It says to itself: Well, duh, if I diff commit H against itself, that's obviously empty. The merge result will have the snapshot from J. What if I just use commit J? If you allow it to do so, git merge will then use commit J, giving:

...--G--H
         \
          I--J   <-- feature, main (HEAD)

after which there's no reason to bother with the kink in the drawing:

...--G--H--I--J   <-- feature, main (HEAD)

If you let Git do it, this is the kind of merge you'll get if you switch to develop and git merge master:

       o--...--o
      /         \
...--o           M   <-- develop (HEAD), master
      \         /
       o--...--o

There's now just the one commit M that is the latest on both branches.

If you now make a new commit on develop at this point, you get:

       o--...--o
      /         \
...--o           M   <-- master
      \         / \
       o--...--o   N   <-- develop (HEAD)

Note how the parent of commit N is commit M: it is an ordinary single-parent non-merge commit. This it the same thing that you get if you first delete the name develop while you have this:

       o--...--o
      /         \
...--o           M   <-- master (HEAD)
      \         /
       o--...--o   [the name develop *was* here, but now is deleted]

then create a new develop from commit M with:

git branch develop    # or git switch -c develop

so that the new name points to commit M.

Some web hosting sites (e.g., GitHub) won't let you do a fast-forward not-really-a-merge operation through their web interface. For these sites, you must use command-line Git, or delete and re-create branch names, to get this particular effect if you want it. When and whether you want it is your choice though!

²You can defeat this in many ways, including both options and the so-called evil merge.

Conclusion

You need to know what Git actually does and what you want. People come up with all kinds of fancy branching models for Git, but just picking one and following it without understanding why someone chose that model could get you into trouble. The point of a branch name, in Git, is to automatically find the latest commit using a human-readable name, and because humans do this, you need to understand what the humans are thinking; this, not Git itself, is what makes this truly hard.