How does Git(Hub) handle possible collisions from short SHAs?

Question

Both Git and GitHub display short versions of SHAs -- just the first 7 characters instead of all 40 -- and both Git and GitHub support taking these short SHAs as arguments.

E.g. git show 962a9e8

E.g. https://github.com/joyent/node/commit/962a9e8

Given that the possibility space is now orders of magnitude lower, "just" 268 million, how do Git and GitHub protect against collisions here? And how do they handle them?

This would not be a concern at the level of GitHub because sha1's are unique to each individual project. — Tone, Aug 20 '11 at 00:17
It's still entirely possible for two 7-character short sha1s to collide within a single project. — Keith Thompson, Aug 20 '11 at 18:12
Does anyone know if it is possible to grab commits via github's API with short SHA... For instance, https://github.com/alexnaspo/var_dumpling-chrome/commit/9e9726ac returns the commit I need, but https://api.github.com/repos/alexnaspo/var_dumpling-chrome/git/commits/9e9726ac does not — Alex Naspo, Nov 26 '12 at 23:03

emboss · Accepted Answer · 2014-09-29T15:32:39.247

68

These short forms are just to simplify visual recognition and to make your life easier. Git doesn't really truncate anything, internally everything will be handled with the complete value. You can use a partial SHA-1 at your convenience, though:

Git is smart enough to figure out what commit you meant to type if you provide the first few characters, as long as your partial SHA-1 is at least four characters long and unambiguous — that is, only one object in the current repository begins with that partial SHA-1.

edited Sep 29 '14 at 15:32

answered Aug 20 '11 at 00:51

emboss

38,880
7
101
108

15

Thanks! That link elaborates further: "Git can figure out a short, unique abbreviation for your SHA-1 values. If you pass `--abbrev-commit` to the git log command, the output will use shorter values but keep them unique; it defaults to using seven characters but makes them longer if necessary to keep the SHA-1 unambiguous." – Aseem Kishore Aug 23 '11 at 01:17
13

Another useful quote: "Generally, eight to ten characters are more than enough to be unique within a project. One of the largest Git projects, the Linux kernel, is beginning to need 12 characters out of the possible 40 to stay unique." – Aseem Kishore Aug 23 '11 at 01:18
Your link is broken... :( – Mrchief Sep 25 '14 at 18:43
@emboss I think the question is not about is git smart enough. Let's assume you have a CI/CD which marks an artifact using a short form of commit SHA. In this case the smartness of git really does not matter. – igops Mar 04 '23 at 13:37

score 35 · Answer 2 · answered Aug 19 '11 at 23:55

35

I have a repository that has a commit with an id of 000182eacf99cde27d5916aa415921924b82972c.

git show 00018

shows the revision, but

git show 0001

prints

error: short SHA1 0001 is ambiguous.
error: short SHA1 0001 is ambiguous.
fatal: ambiguous argument '0001': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions

(If you're curious, it's a clone of the git repository for git itself; that commit is one that Linus Torvalds made in 2005.)

answered Aug 19 '11 at 23:55

Keith Thompson

254,901
44
429
631

9

If you need to know which objects are matched with your ambiguous id (`0001` in this case), you can do `git rev-list --all --objects | grep ^0001`. After you have list of possible full SHA1s, you can do `git show` for each one. – Mikko Rantalainen Jan 22 '13 at 06:32
1

[This answer](http://stackoverflow.com/a/27428930/841555) shows how to disambiguate using only a git command. – Jeremy May 06 '16 at 16:55

VonC · Answer 3 · 2018-04-25T07:45:49.513

14

Two notes here:

If you type y anywhere on the GitHub page displaying a commit, you will see the full 40 bytes of said commit.
That illustrates emboss's point: GitHub doesn't truncate anything.
And 7 hex digits (28 bits) isn't enough since 2010 anyway.
See commit dce9648 by Linus Torwalds himself (Oct 2010, git 1.7.4.4):

The default of 7 comes from fairly early in git development, when seven hex digits was a lot (it covers about 250+ million hash values). Back then I thought that 65k revisions was a lot (it was what we were about to hit in BK), and each revision tends to be about 5-10 new objects or so, so a million objects was a big number.

(BK = BitKeeper)

These days, the kernel isn't even the largest git project, and even the kernel has about 220k revisions (much bigger than the BK tree ever was) and we are approaching two million objects. At that point, seven hex digits is still unique for a lot of them, but when we're talking about just two orders of magnitude difference between number of objects and the hash size, there will be collisions in truncated hash values. It's no longer even close to unrealistic - it happens all the time.

We should both increase the default abbrev that was unrealistically small, and add a way for people to set their own default per-project in the git config file.

edited Apr 25 '18 at 07:45

answered Jan 09 '14 at 08:16

VonC

1,262,500
529
4,410
5,250

I'm curious, what *is* the largest git project? Or at least, what are some of the best examples out there of absolutely massive git repos? – GMA Mar 18 '15 at 13:53
1

@GeorgeMillo As mentioned in http://blogs.atlassian.com/2014/05/handle-big-repositories-git/, you have 2 kinds of huge repos (huge history, or huge binaries) An example of huge git repo was the Facebook one: https://news.ycombinator.com/item?id=7648237 (they switched since then to their own version of Mercurial) – VonC Mar 18 '15 at 13:57
You mean 7 hex digits (28 bits), not 7 bits. – Thomas Jacob Apr 25 '18 at 07:42
@ThomasJacob Thank you. I have edited the answer accordingly. – VonC Apr 25 '18 at 07:46
@ThomasJacob Note: SHA1 won't always be the default hash algorithm used in Git. This is evolving: https://stackoverflow.com/a/47838703/6309 – VonC Apr 25 '18 at 07:54

How does Git(Hub) handle possible collisions from short SHAs?

3 Answers3

Linked