23

Let me prefix this by saying that I am aware of the extremely minuscule odds of this happening. I know that it would be more or less impossible to manufacture, and extremely unlikely to happen "in the wild." This is simply a what-if question about the internals of Git.

So, here is my question: what would happen if two Git commit hashes were identical? For starters:

  • Would the commit succeed?
  • Could it later be checked out as a detached head?
  • Would a subsequent commit be possible?
Ben
  • 842
  • 1
  • 10
  • 21
  • 1
    There's some discussion here but I don't think it really does a great job of answering the question: http://stackoverflow.com/questions/10434326/hash-collision-in-git – mipadi Sep 16 '15 at 22:14
  • I actually saw that, but it appears to be discussing file hashes rather than commit hashes. – Ben Sep 16 '15 at 22:15
  • 1
    Yeah, most of the answers focused on that part of hashing in Git. There is a link to a discussion on the Git mailing list, though: http://thread.gmane.org/gmane.comp.version-control.git/26106/focus=26170 – mipadi Sep 16 '15 at 22:16
  • @mipadi: that deals with collisions on hashes of *files*, not hashes of *commits*. – Willem Van Onsem Sep 16 '15 at 22:36
  • 1
    @CommuSoft: Yes, that was noted and responded to in the comments above yours. – mipadi Sep 16 '15 at 22:42
  • It's likely this has never happened "in the wild". One way to investigate the consequences if it *did* happen would be to modify a copy of git so that it (incorrectly) generates the same hash for two commits. – Keith Thompson Sep 16 '15 at 22:45
  • 2
    It doesn't matter much whether it's "files" or "commits" because internally, git just thinks of everything as "objects". If two *objects* have the same hash, they're the *same* object as far as git can tell, and only the first one to go in, goes in, because after that the second thing that has a hash collision, git just assumes it's the first thing again. I can see that this might cause a command to go a bit haywire when it tries to write a new object but instead accesses an existing object of the "wrong" type, but fundamentally, your repo just goes read-only, in a sense. – torek Sep 16 '15 at 23:53

2 Answers2

9

My old answer "How would git handle a SHA-1 collision on a blob?" would still apply, even for a commit and not a blob.
As torek mentions in the comments, git just thinks of everything as "objects", each with their own SHA1.

https://git-scm.com/book/en/v2/book/10-git-internals/images/data-model-4.png

(Image from Git Internals - Git References chapter of the ProGit Book v2)

While the commit would likely not succeed (there are a couple of checks in git-commit-tree.c), you also have to consider the case where two commits with the same SHA1 (and somehow different content) are created in repos A and B... and repo A is fetching repo B!
Commit 8685da4 (March 2007, git 1.5.1) took care of that, and the fetch would fail.
Commit 0e8189e (Oct. 2008, git 1.6.1) does mention that, with index V2:

the odds for a SHA1 reference to get corrupted so it actually matches the SHA1 of another object with the same size (the delta header stores the expected size of the base object to apply against) are virtually zero.

It still implements a packed object CRC check when unpacking objects.

The Git code mentioned below is the finalize_object_file() function, and a blame shows no recent modification, most of the code dating back from the very beginning of Git (2005): no new commit is created.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
2

According to the source code (present in git v2.17), if a commit lead to an already existing sha1, this is what would happen on Linux (for other operating systems it might be different).

Would the commit succeed?

Yes and no: the git commit command would return as if in success, but the new commit object would not be created.

Could it later be checked out as a detached head?

No.

Reference : file sha1-file.c (commit fc1395f4a491a7da46a446233531005634eb979d)

int finalize_object_file(const char *tmpfile, const char *filename)
{
    int ret = 0;

    if (object_creation_mode == OBJECT_CREATION_USES_RENAMES)
        goto try_rename;
    else if (link(tmpfile, filename))
        ret = errno;

    /*
     * Coda hack - coda doesn't like cross-directory links,
     * ...
     */
    if (ret && ret != EEXIST) {
    try_rename:
        if (!rename(tmpfile, filename))
            goto out;
        ret = errno;
    }
    unlink_or_warn(tmpfile);
    if (ret) {
        if (ret != EEXIST) {
            return error_errno("unable to write sha1 filename %s", filename);
        }
        /* FIXME!!! Collision check here ? */
    }

out:
    if (adjust_shared_perm(filename))
        return error("unable to set permission to '%s'", filename);
    return 0;
}

The link fails with EEXIST, the temporary file is removed, and the code continues until the return 0 (through the FIXME, and the adjust_shared_perm() which has no reason to fail).

user803422
  • 2,636
  • 2
  • 18
  • 36