3

Every time you make a commit, git/hg generates a SHA to uniquely identify that commit in the repository's history.

Suppose I want to merge two repositories(which we have no information of which ones). This pops the question: if someone wanted a specific commit of that same merged repo, would there be a duplicated SHA hash that would confuse git when getting that comment? And if not so, what would git do?

Ultimately I guess the question is also: is there duplicated hashes taken every repository of the whole world?

Patrick Bassut
  • 3,310
  • 5
  • 31
  • 54

2 Answers2

10

Is there duplicated hashes taken every repository of the whole world?

Possibly yes, but that's extremely unlikely. Let me quote the Git book on this one, which contains a very illustrative example:

A lot of people become concerned at some point that they will, by random happenstance, have two objects in their repository that hash to the same SHA-1 value. What then?

If you do happen to commit an object that hashes to the same SHA-1 value as a previous object in your repository, Git will see the previous object already in your Git database and assume it was already written. If you try to check out that object again at some point, you’ll always get the data of the first object.

[...]

Here’s an example to give you an idea of what it would take to get a SHA-1 collision. If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (3.6 million Git objects) and pushing it into one enormous Git repository, it would take roughly 2 years until that repository contained enough objects to have a 50% probability of a single SHA-1 object collision. A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night.

In short: yes, a SHA1 collision is theoretically possible, but so astronomically unlikely that Git simply does not consider this case.

Community
  • 1
  • 1
helmbert
  • 35,797
  • 13
  • 82
  • 95
0

Ultimately I guess the question is also: is there duplicated hashes taken every repository of the whole world?

Unlikely. So far no one has ever found a SHA1 collision. While there might be duplicate hashes these duplicate hashes will identify the very same objects with the very same contents.

see also:

How would Git handle a SHA-1 collision on a blob? and Probability of SHA1 collisions

Community
  • 1
  • 1
TimWolla
  • 31,849
  • 8
  • 63
  • 96
  • But if that happens one day, what would git do? – Patrick Bassut Jan 19 '16 at 22:34
  • 1
    @PatrickBassut I even linked another answer explaining what would happen in my answer: http://stackoverflow.com/questions/9392365/how-would-git-handle-a-sha-1-collision-on-a-blob – TimWolla Jan 19 '16 at 22:36