0

I have a somehwat complicated multi commit cherry-pick and I was thinking that I just wanted to see what the entire changes would look like in its entirety. So I tried to do git diff A B C D E F G H I J K. It looks like it did what I asked for. I looked up what that actually does in the docs, but I'm a little unsure if that is what I'm asking for as it says (emphasis mine):

git diff [<options>] <commit> <commit>…​ <commit> [--] [<path>…​]

This form is to view the results of a merge commit. The first listed must be the merge itself; the remaining two or more commits should be its parents. A convenient way to produce the desired set of revisions is to use the ^@ suffix. For instance, if master names a merge commit, git diff master master^@ gives the same combined diff as git show master.

Does this actually do the equivalent of showing what the changes are if all of the commits were done, excluding other changes that are not relevant to those changes? (There are changes that were done in between some of these changes)

The topology of the git history is:

* L - (11 months ago)
* K - (11 months ago)
*   l - (11 months ago)
|\
| * m - (11 months ago)
* | n - (11 months ago)
|/
* o - (11 months ago)
*   p - (11 months ago)
|\
| * q - (11 months ago)
| * J - (11 months ago)
* | r - (11 months ago)
|/
*   s - (11 months ago)
|\
| *   t - (11 months ago)
| |\
| * | I - (11 months ago)
| * | H - (11 months ago)
| * | G - (11 months ago)
| * | F - (11 months ago)
| * |   u - (12 months ago)
| |\ \
| * | | E - (12 months ago)
| * | | D - (12 months ago)
| * | | C - (12 months ago)
| * | | B - (12 months ago)
* | | | v - (11 months ago)
| |_|/
|/| |
* | | w - (11 months ago)
...
* x - (1 year ago)
*   y - (1 year ago) 
|\
* | A - (1 year ago) 
* | z - (1 year ago) 

An excerpt of the diff shows:

  -----                var sw = new Stopwatch();
  -----                sw.Start();
       ++++            var result = SystemGetTheatreInformationAsync(localTMSClientManager);
  -----
  +++++                //var sw = new Stopwatch();
  +++++                //sw.Start();
       ----            var result = SystemGetTheatreInformationAsync(localTMSClientManager);
       ----

  ---------            if (!result.Wait(TimeSpan.FromSeconds(45)))
  +++++++++            if (!result.Wait(TimeSpan.FromSeconds(90)))    // Have seen this take up to 56 seconds
                       {
                           log.Warn($"{MethodBase.GetCurrentMethod().Name} took too long to execute (timeout exceeded).");
                       }
                       else
                       {
  -----                    sw.Stop();

and what is curious are the leading +s and -s which I'm interpreting as additions and removals at individual commits, but what are the spaces signifying? If they are signifying no change, then I would have expected it to show maybe at the beginning, but wouldn't have expected to see them at the middle or end.

Is this command showing the diff as if it was merged as one or what exactly is it showing?

Adrian
  • 10,246
  • 4
  • 44
  • 110
  • I doubt you literally said `git diff A B C D ...`. So you are asking us to interpret the results of a command you haven't even shown us, in a topology that you have not displayed or described. – matt Jun 28 '22 at 12:19
  • I added this particular bit of documentation, in an attempt to describe how `git diff` implements what `git show` implements. A TL;DR variant is "this doesn't work unless the commit itself is a merge" and then "see the combined diff section of the documentation". – torek Jun 28 '22 at 12:25
  • Diffs show _hunks_. A hunk is a stretch of the file where there is a change but it may have an unchanged line or lines in its middle so those are displayed without the `+-` notation. The vast majority of the file is not displayed at all because it is not a hunk with changes. – matt Jun 28 '22 at 12:26
  • Combined diffs in particular compare some segment(s) of some file(s) against all the parent commits's variant of that same file. If any parent variant exactly matches the file, the diff drops the file entirely. Only if all parents differ does it bother to do the comparison in the first place, and then it shows areas where something "interesting" happened during merging (someone resolved a conflict). – torek Jun 28 '22 at 12:28
  • None of this is remotely close to what you probably want. What you want is to *do the cherry picking* and then compare the final result to what you had before you started the cherry picking. There's no shortcut for this. – torek Jun 28 '22 at 12:29
  • Well, kind of - I didn't touch the combined diff code (I think maybe someday I might want to, it never does quite what *I* want, but I haven't really formulated in my head what exactly I *do* want out of it). – torek Jun 28 '22 at 12:37
  • @matt, updated to show history and command. – Adrian Jun 28 '22 at 12:44
  • @torek, I know how to cherry-pick, but exactly what would I do to do this? Go to `z` and then cherry-pick `A B C D E F G H I J K`? – Adrian Jun 28 '22 at 12:52
  • Yes: start with whatever starting snapshot you want, then add each commit with a cherry-pick. Note that `git diff` generally only compares two snapshots; when using a combined diff for a merge, it compares the final resulting snapshot (the merge commit) against each of the input commits (assumed to represent inputs to `git merge`, e.g., two branch-tip commits being merged). – torek Jun 28 '22 at 13:18
  • @matt re "Diffs show hunks". So is each symbol `+-` and space to indicate that, relative to... the previous commit? There's an addition/deletions/no change? If that's the case, I would have expected only a single addition or deletion, not several in a row for a particular line. Or is it relative to the first commit listed? That sounds more plausable. – Adrian Jun 28 '22 at 13:18
  • A diff tells you at the top what each column means. – matt Jun 28 '22 at 13:19
  • @mat, you mean the `index ff66999b0,ff66999b0,f0fcee90a,f0fcee90a,f0fcee90a,be39896cf,e0169f18e,3876b190e,911e31c40,0c8516fa6,0c8516fa6..c6358b908`? These hashes don't seem to match any of the commit hashes that I specified. – Adrian Jun 28 '22 at 13:23
  • I can try to write up a more detailed description of combined diffs later, but not for some number of hours at least. – torek Jun 28 '22 at 13:30
  • By the way, those `index` lines are for `git am` or `git apply -3`. They have no function in the diff you're reading: the apply and am code cannot deal with combined-diff diff hunks in the first place. – torek Jun 30 '22 at 10:06

1 Answers1

3

TL;DR: "Combined diffs"

When you invoke git diff this way—by naming three or more commits on the command line—you're invoking the same internal machinery that git show uses for showing a merge commit. This depends on the fact that each commit stores a single Git "tree" object. A merge commit has two or more parent commits, each of which also has a tree object, so when git show is handed a merge commit hash ID, it has three or more tree objects to compare, while the basic difference-engine algorithm can only take two at a time. It therefore does ... something, and knowing what that "something" is, is useful, every time you see one of these things. Git calls this something a combined diff. You don't have to memorize the details of combined diffs: just look them up in the manuals. Do, however, remember that the Git documentation splits up two key facts about combined diffs:

  • Combined diffs omit some files entirely.
  • Combined diffs can omit some diff hunks entirely.

When reading the manual pages, remember to search for both sections about combined diffs.

Note that you get combined diffs for a plain, no-arguments-provided git diff command when you're in the middle of an incomplete merge operation, too. However, in this case, the sources for the to-be-diffed files are your working tree and Git's index, rather than multiple commits. This answer does not cover these details.

Short-ish

Git adds special git diff logic in an attempt that (in my opinion at least) works sort-of-OK for some common cases where we want to see why a merge commit has the tree it has, and not some other tree we might have expected. This attempt has some flaws, and since merge-ort became the default merge strategy (in Git 2.34) and it has some new features, there may someday be a better way to do this, but for now, git show of a merge can sometimes help you figure out what happened. The mechanism it uses is to run git diff n times, where n is the number of parents of the merge commit, then combine parts of the results, and discard other parts of the results, to form a combined diff. This makes sense for showing a merge. It makes less sense for showing a non-merge commit (and no sense for your original cherry-pick purpose).

The files that get omitted entirely from a combined diff are those where the merge commit's version of the file exactly matches the version in at least one of the parent commits. The general idea here is that whoever did the merge must have thought that one of the two parents had the right code. The flaw in this general idea is that perhaps the person who made the merge was an idiot. (Or, to be a bit fairer, perhaps the person who made the merge didn't realize that they needed to look at the other commit's changes. This actually happens surprisingly often in real life, via people who were not properly taught how to use Git's merge operation.)

In any case, the git diff command line syntax has always allowed you to invoke the combined-diff code on any number of Git tree objects of your choice. When I was fixing a small bug in git diff, I updated the usage documentation (see commit b7e10b2ca210d6a3647910fdecea33581e4eaf0d) to mention that this is how you can get git diff to do what git show does.

The actual operation uses the commits' trees, and you can run git diff on what Git calls a <tree-ish>. That is, git diff HEAD HEAD^1 HEAD^2 operates using HEAD^{tree} as the merge-commit's tree and HEAD^1^{tree} and HEAD^2^{tree} as the other two trees. You can invoke it this way. But that's something of an accident of the implementation: if we documented it formally, we'd never be able to change this. There's some tension between documenting "what Git really does" and "what Git logically should be doing", and in this case, I felt that consistency favored the "logically should be doing", so that's what's in the linked commit.

Although I don't think anyone should use this mechanism with a non-merge commit, reading the long description below will allow you to understand what you see. You can decide whether what you see has any use to you: Git is a set of tools, not a solution, and you can plug the power screwdriver into the bandsaw even if that doesn't make sense. But by not documenting that you can do this, we try to keep people away from it. In a longer article like this one, though, I go right into the actual mechanism, so if you can do it, I show you how: I just tell you "don't trust it too much".

Long

You're ultimately interested in cherry-picking multiple commits, but you've asked about git diff. This is a bit of an XY problem: cherry-picking is actually a kind of merge, which is not just a diff, and cherry-picking multiple commits means doing multiple merges, not one big merge. Still, the question you are asking is a valid question. It has an answer. Here is that answer.

Warning: this gets a bit long. Let's start with the easy part:

Does this actually do the equivalent of showing what the changes are if all of the commits were done ...

No. All of the commits you feed into this kind of git diff are already done! There's no "if they were" about it. They are. They are not "proposed changes to make", they are existing commits. Moreover, no commit is a change in the first place!

Commits are snapshots; diffs are comparisons of snapshots

Let's take that first claim and refine it a bit. A commit is a snapshot plus metadata, so "commits are snapshots" is an incomplete statement: true as far as it goes, but missing the "plus metadata" part. The "snapshot" part is what we're concentrating on here though, so while we should keep the "plus metadata" in mind, let's go with the "snapshot" part:

Every commit has a full snapshot of every source file. More precisely, it has the source files it has: any source files it lacks means, in effect, "when extracting this snapshot, make sure to remove other files". Think of each commit as an archive (tarball, WinRAR, zip archive, whatever). If you downloaded and installed that archive, you would have those files, and no other files. That's the snapshot in the commit.

(The actual format of this snapshot is very special and Gitty, such that when we make thousands, or millions, of snapshots of some project, it hardly takes any more space than just one snapshot, or maybe a few tens or hundreds of snapshots. Git achieves this through de-duplication of snapshotted files, plus delta compression of internal Git objects that gets applied later in the process. We don't need to worry about any of this: that's all invisible to us, except in terms of savings on disk space and network bandwidth when we clone the repository.)

So, given any two commits, we have two snapshots. If the two commits are "near" each other, they are like two film frames. We can take those snapshots and place them side by side, and play a game of Spot the Difference. Did the dog move? Maybe his fur color changed! Look, the hands on the analog clock moved: an hour passed between the two snapshots!

Instead of completely writing down the two snapshots, we can express the difference between them, as a git diff. Git's git diff is inspired by the old context diff and unified diff formats from the traditional Unix diff command, descended from Doug McIlroy's original implementation (see the Hunt–Szymanski or Hunt-McIlroy algorithm, though Git uses a variant of the Myers algorithm: see Myers diff algorithm vs Hunt–McIlroy algorithm). If we use this algorithm on adjacent commits—commits with a parent/child relationship, in Git—we see a representation of the change that some human made.

Sidebar: a diff isn't necessarily what a human did

Note that we don't necessarily see the actual change some human made. To take a trivial example, suppose someone has this as their original file:

Paris in the
the
the
spring

The human deletes one of the three redundant words the: perhaps line 2. The computer says: "delete the last of the three redundant words the", i.e., delete line 3. That's not the same thing, but it produces the same result.

More commonly, when we have languages that use balanced braces and/or parentheses as part of their construction, we might have:

if repeatable_test {
    thing1 // may change the condition, so that testing again
    // produces a different result
}
if repeatable_test {
    thing2 // may change the condition
}

Someone might insert:

if repeatable_test {
    thing3
}

between the first and second test, and our diff algorithm might present this as a change of the form:

 if repeatable_test {
      thing1
 }
 if repeatable_test {
+    thing3
+}
+if repeatable_test {
     thing2
 }

This change achieves the same result. But it's not what the human did. To the machine, there's no obvious way to choose which diff to use. Git recently (version 2.14) picked up a default diff indent heuristic for display to help out here, but it is not (yet?) used in merge, and this can cause problems during cherry-picking when Git picks the wrong set of "changed lines". (It's not all that common for it to cause problems, and indeed, it's not all that common to see it in the first place. That, plus the fact that the indent heuristic is non-obvious and doesn't always work, is why it took until Git 2.14 for Git to acquire it.)

Git merges

Before we cover "combined diffs", we really need to note some things about git merge. The key insights are are these two:

  • Merging is about combining work. This means we need to define "work".

  • A merge commit, in Git, is a commit with special metadata. In particular it has two or more parent commits. There is nothing special about the snapshot in a merge commit. It is just the same ordinary snapshot as in any other commit.

Let's look briefly now at some of the metadata in each commit.

Every commit has a unique hash ID. The hash ID is a big, ugly, random-looking string of letters and digits, such as e4a4b31577c7419497ac30cebe30d755b97752c5. This is actually a very large number expressed in hexadecimal. The number isn't actually random: it's a cryptographic checksum of the raw commit data, so that every piece of Git software anywhere in the universe will compute the same hash ID for the same commit. That way, two separate implementations of Git, working with two separate repositories, can talk to each other and find out which repository or repositories has some particular commit, just by comparing hash IDs. This clever trick resides at the heart of Git's distributed nature, making it efficient to have distributed clones of repositories. All we really need to know, though, is that the hash ID uniquely identifies some particular commit. Git needs this hash ID; if we can give Git the hash ID, Git can tell if it has the commit, and if it does have the commit, Git can get, use, and display the commit. If our Git—our software working with our repository—doesn't have the commit, we hook ours up to some other Git that does, and get it, and then we're good.

So: each commit has a snapshot plus metadata, and in the metadata for any one given commit, Git stores a list of previous commit hash IDs. Most commits have exactly one previous-commit-hash-ID in this list. Such a commit is an ordinary commit: it has one parent, and Git uses the stored hash ID in the commit to get and use the parent.

Being able to get the commit itself—the child—and the parent gives Git two commits, and now Git can play the Spot the Difference game and show us a diff. That's what we see when we run:

git show main

for instance. Git uses the name main (or in my case below, the special magic name HEAD) to find a hash ID like e4a4b31577c7419497ac30cebe30d755b97752c5, uses hash ID e4a4b31577c7419497ac30cebe30d755b97752c5 to find parent commit 49c837424a6152618aad42fa6d5083c6be1fa718, and uses the pair so that we get:

$ git show
commit e4a4b31577c7419497ac30cebe30d755b97752c5 ...

diff --git a/GIT-VERSION-GEN b/GIT-VERSION-GEN
index 120af376c1..b210b306b7 100755
--- a/GIT-VERSION-GEN
+++ b/GIT-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=GIT-VERSION-FILE
-DEF_VER=v2.37.0-rc2
+DEF_VER=v2.37.0
 
 LF='
 '

That's fine for an ordinary commit: the changes in the commit are those from the parent to the commit. But this also leads us to how merge works.

Let's draw a series of commits in a new, nearly-empty repository. Let's say we have just three of them, and for simplicity in our drawing, let's pretend their hash IDs are A, B, and C in that order. Then we have:

A <-B <-C   <--main

The name main provides the hash ID of the latest commit C. Commit C stores a snapshot and metadata, and the metadata give Git the hash ID of commit B. Commit B stores a snapshot and metadata, and B's metadata give Git the hash ID of commit A. Commit A stores a snapshot and metadata ... well, you get the idea, but let's note that A is the very first commit. As such, it has no parent, so its list of parent hash IDs is just empty. This allows a program like git log, which works backwards from the end to the beginning, to stop.

Now suppose time has passed and we have more commits (perhaps as many as eight!) and our drawing now looks like this:

...--F--G--H   <-- main

The name main now locates commit H, which points back to earlier commit G, which points back to F, and so on. For various reasons I've grown lazy about drawing the arrows between commits, but still use an arrow coming out of a branch name to show where the branch name points.

Let's now make a new branch name, br1, that also points to commit H, like this:

...--F--G--H   <-- br1, main

Note that all the commits are on both branches. We do, however, now need a way to know which name we're using. To help out, Git uses the special name HEAD, written in all uppercase like this: it "attaches" this special name to one branch name. If we are "on" main—if we have run git checkout main or git switch main—then HEAD is attached to main:

...--F--G--H   <-- br1, main (HEAD)

If we run git switch br1, to switch to branch br1, we get:

...--F--G--H   <-- br1 (HEAD), main

Either way we're using commit H, but we're using it through a different name.

Now suppose we add one new commit, in the usual way (modify some files, git add, and git commit). We get a new commit, with a new, unique hash ID: we'll call this I for short, and draw it in:

             I   <-- br1 (HEAD)
            /
...--F--G--H   <-- main

Note how HEAD is still attached to the name br1, but the name br1 now points to I instead of H. If we make a second new commit we get:

             I--J   <-- br1 (HEAD)
            /
...--F--G--H   <-- main

If we "switch back" to main (with git switch main or git checkout main), we get:

             I--J   <-- br1
            /
...--F--G--H   <-- main (HEAD)

Git removes the commit-I files—they're safely archived in commit I for later recovery—and installs the commit-H files for us to work on / with. We can now create another branch name br2, or just use main.

I'll go ahead and create and switch to a new name br2 and then make yet another new commit K, to get this:

             I--J   <-- br1
            /
...--F--G--H   <-- main
            \
             K   <-- br2 (HEAD)

Adding yet another commit L gives me:

             I--J   <-- br1
            /
...--F--G--H   <-- main
            \
             K--L   <-- br2 (HEAD)

This is how branches work (and grow) in Git. But now that we have some branches, we might want to use git merge. And, as we noted above, merging is about combining work. But what work did we do on branch br1? How will we know? What work did we do on branch br2? How do we combine this work?

We could try using git diff on commits J and L, but that's going to be wrong. Suppose that in H, we described, in some text file, a red ball, and by commit J we had changed it to a blue ball. Meanwhile in the K-L series of commits, we left it alone. A diff from J to L will say that we should change blue ball back to red ball. That's not right!

Meanwhile, maybe on the H-K-L line we found that RED and BLUE needed to be qualified: spelled out as COLOR_RED and COLOR_BLUE in some code files. We want to keep those changes too. If we compared L to J, it would say to change those back, and that's not right either.

What we need is to somehow compare what's in commit H—the starting point where we began "new work" on br1—to what's in commit J, to see what work we did on br1. Then, using the same starting commit, we can compare what's in H to what's in L, to see what work we did on br2.

Commit H in this case is the merge base, and doing these two sets of comparisons is how merging works. We diff H twice: once against J to see what changes happened on the H-I-J path, and once against L to see what changes happened on the H-K-L path.

We then simply (or complicatedly) have Git combine these two sets of changes. If we changed red ball to blue ball, and changed if color == RED to if color == COLOR_RED, and likewise BLUE to COLOR_BLUE, Git will try to keep both changes. In some cases, these two changes will overlap (touch the same lines of the same files) and Git will declare a merge conflict. If Git doesn't see any conflicts—if no diff lines overlap, more or less1—Git will do the merge entirely on its own. If Git does see conflicts, it will stop in the middle of the merge. Git will make us fix up the files to contain the "right" final result, whatever we claim that is. Either way—whether we have to fix up the files ourselves, or whether Git thinks it can do everything on its own—we eventually pick a final snapshot to use with our new commit, and we have Git make this new merge commit M, like this:

             I--J
            /    \
...--F--G--H      M
            \    /
             K--L

I took all the branch names out of this diagram for several reasons:

  • it's hard to make main point to H and one of the two brs point to M without having the text overlap;
  • the br name that points to M depends on which branch we're "on" when we run git merge, but the snapshot that goes in M depends only on the actual merge snapshot, and that's what we really care about here.

So let's just assume that, some time later, we have:

          I--J
         /    \
...--G--H      M--N   <-- somebranch (HEAD)
         \    /
          K--L

Our key concepts here (remember the "key insights" line from above?) are that merge commit M has two parents, J and L and that the snapshot in M is the result of combining work. The "combining" took two diffs—a diff from H to J, and a diff from H to L—and smashed them together and applied the smashed-together changes to H to get M, possibly with human assistance.


1Git considers two diffs to conflict with each other if they just "touch at the edges" (abut), too. This is an arbitrary choice: some merge algorithms don't call this a conflict, and some do.


Combined diffs, or, how can we "see" commit M?

When we look at an ordinary (single-parent) commit, we "see" it as a diff:

diff --git a/GIT-VERSION-GEN b/GIT-VERSION-GEN
index 120af376c1..b210b306b7 100755
--- a/GIT-VERSION-GEN
+++ b/GIT-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=GIT-VERSION-FILE
-DEF_VER=v2.37.0-rc2
+DEF_VER=v2.37.0
 
 LF='
 '

In this commit, just one file changed, GIT-VERSION-GEN; one line of that one file changed. A simple git diff shows us this. The diff algorithm itself can only compare two snapshots but we only have two snapshots to compare.

But for a merge commit like M, we have at least three snapshots. We have J, L, and M. (We might even have H, if we care to find it again. Unfortunately Git doesn't record H's hash ID, which I think is a mistake: we can run the same algorithm again to find H, but Git also doesn't record the algorithm used, and does offer us a choice of algorithms, so we're SOL, and that's why it's a mistake. I think Git should have recorded the algorithm and the merge bases used, just for completeness, but certainly it should have saved at least one of these.)

Some commands, including git log by default, just say, in effect: Oh, that's too hard. I just won't show any diff at all. They don't invoke the diff algorithm on merge commits.

Other commands, including git show by default, have a different answer: they invoke a combined diff. A combined diff, in Git, takes a merge commit snapshot like M, and runs more than one diff. For git show in particular, Git will list out the parent commits, in order—the list has an order—and run one git diff from the first parent to the merge, then a second git diff from the second parent to the merge. A merge can, technically, have two or more parents, so if it has three or more, Git keeps going here, running a third diff from the third parent to the merge, and so on.

Each diff here can list one or more changes to one or more files. One way to combine such diffs would be to literally combine them all, but that's not what git diff's combined diff does. Instead, it takes a couple of short-cuts, based on the idea that it's showing a merge commit.2 Specifically, Git looks at each parent-vs-merge comparison first. If any file in any parent exactly matches the final file in the commit, Git throws that file out of the diff entirely!

For a merge commit, this leaves only those files where the merge commit's version of the file doesn't match any parent commit's version of the file. That is, for our merge commit M, file shown.txt doesn't match J's shown.txt and doesn't match L's shown.txt. Git took changes from both commits and combined them—hence the name "combined diff".

Now, maybe branch br1 changed lines 5 through 10 of shown.txt and branch br2 changed lines 105 through 110, so that there was no overlap at all. If that's the case, you'll see those changes with single + and - lines. These markers will show where lines were added or deleted, and which parent it was that got changed to produce the final result.

But maybe there was some overlap. Maybe br1 changed lines 5-10, and br2 changed line 7, right in the middle. Here, you'll see + and - lines where there are multiple + and/or - markers on the same line, like this example from the documentation:

- static void describe(char *arg)
 -static void describe(struct commit *cmit, int last_one)
++static void describe(char *arg, int last_one)

Here, parent #1 said static void describe(char *arg). Parent #2 said static void describe(struct commit *cmt, int last_one). The merged result says static void describe(char *arg, int last_one). The diff output tells you that both of the two input lines were deleted and the final result effectively adds the new line to both input files. (We cannot see the merge base commit's copy of this file at all, as Git has no idea which commit(s) were the merge base(s).)


2This means that if you use it on something that isn't a merge commit, you're getting a deliberately defective diff. As long as you know this and take it into account, that's OK, just remember that Git does this.


-c vs --cc, and final notes

Note that when choosing a combined diff, you can either ask for -c or --cc. The difference between these is poorly documented: currently the main mention is in the git log documentation, under the --diff-merges option, which says this:

--diff-merges=combined
--diff-merges=c
-c

With this option, diff output for a merge commit shows the differences from each of the parents to the merge result simultaneously instead of showing pairwise diff between a parent and the result one at a time. Furthermore, it lists only files which were modified from all parents. -c implies -p.

--diff-merges=dense-combined
--diff-merges=cc
--cc

With this option the output produced by --diff-merges=combined is further compressed by omitting uninteresting hunks whose contents in the parents have only two variants and the merge result picks one of them without modification. --cc implies -p.

I'm not completely convinced that these descriptions are entirely accurate for all versions of Git, but they mean what they say: --cc "densifies" a combined diff by omitting any diff hunk where the merge result in that diff hunk matches any parent commit shown in that same hunk. This is useful with merges in that it shows us where a human did some picking and choosing during a merge conflict. Note that it discards conflict cases where the human picked one of the two parents—but that already happens anyway, even without --cc, if the human did that for the entire file! )

For a more precise definition of "diff hunk", see In the context of git (and diff), what is a "hunk". VonC's answer here also discusses the "indent heuristic" I mentioned above.

Last, remember that any combined diff deliberately throws away some information. It does so on the assumption that the "final" snapshot is that in a merge commit. You can avoid combined diffs, even for merge commits, by asking Git to "virtually split" the merge, using git show -m or git log -p -m for instance. When Git encounters a merge commit M here, it pretends, for diff-ing purposes, that there are two or more commits, one for each parent. Commit M (from J) gets shown as git diff J M, and commit M (from L) gets shown as git diff L M. For an octopus merge—which is what Git calls any merge with three or more parents—you'll get three or more "virtual splits". (However, if you add --first-parent to the git log options here, Git does the split, then only shows the first-parent-vs-merge-commit diff. For some workflows this is actually a very useful option.)

torek
  • 448,244
  • 59
  • 642
  • 775
  • "A merge can, technically, have three or more parents" Shouldn't this be two or more? – Adrian Jun 29 '22 at 14:42
  • So a merge commit is only relating to a parent and their direct children. As soon as you go to a grandchild or further, this is no longer a merge commit? So that means that if you did go further than a child, what I would be seeing is garbage? – Adrian Jun 29 '22 at 15:05
  • @Adrian: well, 3 is more than 2, so both statements would be true. My point in saying "can have 3 or more" is to *augment* the earlier claim that it has two so I started with N+1 instead of N, but in other answers on SO, I do use the phrase "2 or more". As for doing `git diff `, the *mechanics* here are the same as they are for `git diff `: Git uses the tree of each commit, in the same mechanical process. It's just that if the tree of the first-named-commit isn't the *result of a merge*, then [continued] – torek Jun 30 '22 at 09:00
  • ... then the way that a "combined merge" deliberately *drops* part of the diffs no longer makes any sense. It's roughly like saying "well, if you are trying to identify some kind of tree, we need both the leaf and the twig" (this is true in the world of tree-identification, and there's a technical definition of "twig" here) ... so someone hands you a leaf and a twig, but they're from two different forests and hence probably not from a single tree and you'll just get nonsense as a result. – torek Jun 30 '22 at 09:02
  • In any case, I rephrased the "technically" bit above so that it's more technically accurate and even longer now. :-) (I think I'll add a TL;DR section as well, if I can phrase it properly.) – torek Jun 30 '22 at 09:04
  • Very detailed and definitely TL;DR. Thanks for that. – Adrian Jun 30 '22 at 20:00