3

I was wondering if it is possible to create two commits with the same hash.

Let's just say I'm on the master branch and I create a new branch called foo. Now let's say that I have two terminal sessions that are both authorized as the author john.smith@gmail.com. Now let's say that on one terminal session is on master and another terminal session is on foo and both of the branches have the exact same staged changes. Now let's say that I run the git commit command at the exact same time in both terminal sessions...

Wouldn't the two commits end up having the same hash value?

Ogen
  • 6,499
  • 7
  • 58
  • 124
  • please refer answers of http://stackoverflow.com/questions/9392365/how-would-git-handle-a-sha-1-collision-on-a-blob – gzh Jan 31 '17 at 07:15
  • @Slava.K it's quite different, my question isn't about how git would handle the situation, my question is about if you can actually create the situation. – Ogen Jan 31 '17 at 07:21
  • 1
    See also http://stackoverflow.com/questions/25128077/how-does-git-assure-that-commit-sha-keys-for-identical-operations-data-are-still - in footnote 1 I give a formula for calculating "unforced" collisions probabilities. If you can break, or partially break, SHA-1, you can increase the probability of a collision. However, even if you have a known second preimage attack for general SHA-1, you may need to modify it to work with Git. – torek Jan 31 '17 at 10:01

1 Answers1

4

Yes, it is theoretically possible that you encounter this situation. The commit hash is generated from the content of the commit object, which are:

  • The commit message
  • The author name
  • The authoring timestamp
  • The committer name
  • The commit timestamp
  • The list of parent commits
  • The tree object reference

The tree object reference is an object hash itself, consisting of references to blob objects and subtrees. So it will be identical for an identical tree of files.

So if all those properties of the commit are identical, then yes, you would end up with the same hash. This can absolutely be constructed if you use the same author and commit at the exact same time; since the resolution of the timestamp is only in seconds, you don’t even need to be that precise.

But is this a problem in practice? Not really: You would usually not commit with the same user at the same time; instead, you would have separate contributors with their own identity working on their own stuff. So the probability of commits getting the same hash is near zero.

But even if this situation happened in practice. Would there be a problem? No. The commits are identical by definition (and by construction). So they are the same. And they are compatible with each other, so when you push or pull later, it will just look as if you already had that commit and just nothing happens.

Of course, there is the remaining problem of hash collisions due to the limited hash space of SHA1. This can become a possible problem in very large repositories, but I haven’t heard of it happening yet—although there are already repositories of gigantic sizes. But even if it happened for one of those, it would not affect other repositories with more managable sizes.

poke
  • 369,085
  • 72
  • 557
  • 602
  • 1
    Thanks, this is a well-worded answer - very easy to understand. I never realized that even if it *did* happen it wouldn't be an issue! – Ogen Jan 31 '17 at 07:20