Is there a way to check if two different git commits are equal in content?

Question

I know that git tracks content and generates a sha based partially on the content. However, the sha is also based upon the parent commit. When I rebase a branch, because my commits now have a different ancestor, all of my commits have different shas.

But what I'm wondering, is there a way to compare two commits (or commit ranges) to see if content-wise, they are the same? This should also be able to tell if a binary change is the same as well.

I'm thinking if there was some way to get the sha for the content without the ancestor information incorporated, that might do it.

Thanks for any and all help,

http://stackoverflow.com/questions/1191282/git-diff-commits-difference — Seçkin Savaşçı, Jul 12 '12 at 19:19

score 15 · Answer 1 · answered Jul 12 '12 at 20:41

15

You want the --cherry-mark option to git log which marks commits with an equals sign when their patch content is the same.

git log --decorate --graph --oneline --cherry-mark --boundary A...B

is a great way to compare the rebased branch B with the original branch A. I use this for checking that my commits made using git-tfs are still ok once TFS has been at them.

answered Jul 12 '12 at 20:41

patthoyts

32,320
3
62
93

Hmmm... this one seems the most promising, except for a few things I can't figure out. The first one is, how do you have a rebased branch B and an original branch A? Are you creating a backup branch or spawning a new one? In my scenario, I'm simply rebasing an existing branch on master. The second one is that in my binary files, even though their checksum is the exact same, it's marking the commits as inequivalent. Is there a piece I'm missing? – Nate Cavanaugh Jul 12 '12 at 22:13
which version of git are you using with --cherry-mark? I do not find it on what I have installed. – Chris Cleeland Aug 23 '12 at 13:38
This option was added to revision.c in commit adbbb31 on 9-Mar-2011 and is present in git 1.7.5 and above. (I used git gui blame and git describe to find this out). – patthoyts Aug 23 '12 at 13:49
this is amazing – orip Dec 11 '18 at 12:09

score 7 · Answer 2 · answered Aug 29 '19 at 21:12

7

https://stackoverflow.com/a/23527631/2630028

diff <(git show 123456) <(git show abcdef)

answered Aug 29 '19 at 21:12

solstice333

3,399
1
31
28

score 0 · Answer 3 · answered Jul 12 '12 at 19:20

0

commit headers contain a sha for the tree object, which is filesystem content (content + structure). Wouldn't be too hard to write a script to compare those.

Blobs also have IDs, so you could compare those if you're just looking at one or two files.

answered Jul 12 '12 at 19:20

Ben Collins

20,538
18
127
187

Unfortunately, I'm not sure that's quite correct (at least regarding the tree). Basically, when I view the tree sha for the commit, it's changed, even though the branch was only rebased (nothing changed about the file). – Nate Cavanaugh Jul 13 '12 at 20:59
the tree id could change by some other blob in the directory having changed in some way. If all you want to do is compare differences between two blobs, then you'll have to write code of some sort to decompress the blobs, strip the headers, and then hash what's left in order to do the kind of comparison you want. Of course, you could just do `git log A..B -- path` and save yourself the trouble. – Ben Collins Jul 14 '12 at 03:54

Jonny · Answer 4 · 2022-01-26T17:30:58.833

@solstice333's answer addresses the problem behind the question but the output still requires a lot of parsing. If you want a quick way to confirm similarities after a rebase, an extension of that solution should help reduce a majority of the clutter.

Solution

hash1=123456 \
hash2=098765 \
sed_command='/^\(diff\|index\|@@\).*$/ d' \
diff <(git show --oneline -U0 $hash1 | sed "$sed_command") <(git show --oneline -U0 $hash2 | sed "$sed_command")

(If you aren't using bash, you can just use the bottom line and replace the variables)

Breaking it down

This command finds the changes from each commit, cleans up the output, and finds the differences in the outputs. The outputs are trimmed down via:

--one-line - simplifies commit message output
-U0 - shows only the changed lines/removes all surrounding lines; shorthand for --unified=0;
sed - Using this command, we reduce clutter in the output by removing any line that starts with certain characters based on the regex (/^$diff\|index\|@@$.*$/). Some conditions here can be removed for better specificity but generally do not matter due to the nature of the differences in a rebase.

I suppose you could also extend this to use git diff in place of the inner git show commands, to encapsulate all changes in a commit range.

Is there a way to check if two different git commits are equal in content?

4 Answers4

Solution

Breaking it down