1

Let's say we're trying to produce a build of a security-critical project.

When I run git clone <url>, how can I assure myself that the code I checked out is the same as the code that's on GitHub? Would it suffice to just compare commit hashes?

Assuming that my git installation isn't malicious, should I just assume that git has me covered?

justinmoon
  • 443
  • 5
  • 11
  • Use `git log`, look at the commit hash at the top and compare it to the hash of the latest commit on GitHub. But as long as no one has stolen the SSL certificate, `git clone` should not check out anything else (assuming you use the correct URL). – dan1st Dec 19 '20 at 19:36

2 Answers2

2

When I run git clone , how can I assure myself that the code I checked out is the same as the code that's on GitHub?

The easiest way is to use git log, look at the commit hash at the top and compare it to the hash of the latest commit on GitHub.

Would it suffice to just compare commit hashes?

Git uses SHA1. It is possible to cause collisions so that a hacker creates something else with the same commit hash. Another possibility is to create commits with a hash not matching the data.

Assuming that my git installation isn't malicious, should I just assume that git has me covered?

But as long as you use HTTPS/SSL no one has stolen the SSL certificate (if someone would have stolen thr one from GitHub, this would be a very big thing and he could also change the web UI), git clone will not check out anything else (assuming you use the correct URL).

If you use HTTP, you don't have the protection from an SSL certificate and someone else could send you anything else.

dan1st
  • 12,568
  • 8
  • 34
  • 67
1

That is the exact point Linus Torvalds was making in his 2007 (Git was two years old) Google presentation of Git (video): see slide 10

Around 57':

Having a good hash is good for being able to trust your data, it happens to have some other good features, too. It means that when we hash objects, we know that the hashes are actually well distributed and we don’t have to worry about certain distribution issues.

So internally it means from an implementation standpoint, we can trust that the hashes are so good that we can use hashing algorithms and know there are no bad cases.

So there are some reasons to like the cryptographic side too, but it’s really about the ability to trust your data.

I guarantee you, if you put your data in git, you can trust the fact that five years later, after it was converted from your harddisc to DVD to whatever new technology and you copied it along, five years later you can verify the data you get back out is the exact same data you put in.
And that is something you really should look for in a source control management system.

Even if SHA1 could in theory be broken (different content with same hash), it is harder to do with Git, and it will soon have SHA2 hashes for its commits anyway.

So yes, comparing your HEAD commit (git rev-parse) with the one on GitHub (git ls-remote) is enough:

git rev-parse --short HEAD
git ls-remote 
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250