25

Yesterday one of my team's checkins corrupted our github repo. On github, they were showing this error:

$ git fsck
error: sha1 mismatch 87859f196ec9266badac7b2b03e3397e398cdb18

error: 87859f196ec9266badac7b2b03e3397e398cdb18: object corrupt or missing
missing blob 87859f196ec9266badac7b2b03e3397e398cdb18

When I tried to pull onto a different machine, I got this:

Hyperion:Convoy-clone saalon$ git fsck
warning in tree 5b7ff7b4ac7039c56e04fc91d0bf1ce5f6b80a67: contains zero-padded file modes
warning in tree 5db54a0cdcd5775c09365c19c061aff729579209: contains zero-padded file modes
broken link from    tree 6697c12387f8909cfe7250e9d5854fd6713d25c1
              to    blob 87859f196ec9266badac7b2b03e3397e398cdb18
dangling tree 144becf61ae14cec34b6af1bd8a0cf4f00d346d1
missing blob 87859f196ec9266badac7b2b03e3397e398cdb18

(I get the zero-padded file warnings on both the offending machine and the second machine I pulled to. I get the broken link error only on the second machine).

I tracked down the blob to the specific file that's the problem, but after going through the Git FAQ's process on fixing a broken link error, I had no luck.

I went through Github's documentation and found a process to delete the master repo from github and repush from the offending machine. I tried this, but when I went to re-push the master branch, I got the following error:

fatal: SHA1 COLLISION FOUND WITH 87859f196ec9266badac7b2b03e3397e398cdb18 !
error: unpack failed: index-pack abnormal exit

I've got an open ticket with Github but it's taking them forever to respond. Any idea what the problem might be? Is there a problem at Github that they need to fix, or is there something I can do to take care of this?

saalon
  • 3,644
  • 3
  • 33
  • 40
  • Specifically, I followed these instructions on fixing a broken repo: https://git.wiki.kernel.org/index.php/GitFaq#How_to_fix_a_broken_repository.3F – saalon Feb 01 '11 at 15:54
  • And these on removing and re-pushing master: http://help.github.com/egit-corruption/ – saalon Feb 01 '11 at 15:55
  • 1
    And what did you do while going through the Git FAQ's process? Did you find a correct file version with the same hash? (item C in FAQ) – ssmir Feb 01 '11 at 19:26
  • Yes, I did, and I've since fixed the git fsck problems on my local repo. Unfortunately, in trying to fix the problem, I followed github instructions to delete and re-push the master branch, but the re-push - both before and after fixing the missing blob - is giving me the SHA1 Collision error, so I can't get the fixed repo to github. Not sure if I've done something wrong or if there's something wrong at github. – saalon Feb 01 '11 at 19:33
  • 1
    @saalon Do you have a fixed local repo now? Are you able to clone it locally? – ssmir Feb 01 '11 at 19:48
  • @ssmir I do, and I am able to do a local clone ("git clone --local Convoy-clone/ local-clone/"). – saalon Feb 01 '11 at 19:54
  • 1
    @saalon You'd better try without --local to make the clone work more like a clone over a network. And did you change the default branch on GitHub (as in Fixing egit corruption) before deleting the master? – ssmir Feb 01 '11 at 20:24
  • @ssmir Success on the clone without --local. And yes, I changed the default branch (and it remains changed now, in absence of the master branch). – saalon Feb 01 '11 at 20:28
  • 1
    @saalon Are you able to clone your GitHub repo? Does the default branch now point to the first commit? Did you remove all other branches from the remote repo? (to decrease the chance that the corrupted blob is still in the remote repo) – ssmir Feb 01 '11 at 20:38
  • @ssmir In order: Yes, I can clone. Yes, first commit. No, but I can't, we have a release candidate branch (but the offending commit was never made to it or cherry-picked into it). – saalon Feb 01 '11 at 20:44
  • @saalon So `git push origin master` fails? What about e.g. `git push origin master:refs/heads/master2` or `git push -v origin master` ? – ssmir Feb 01 '11 at 21:06

4 Answers4

22

After some back and forth with GitHub (and some troubleshooting help from ssmir), this problem is split between a thing I needed to solve and a thing Github needed to solve.

What needed to be solved on my end was this:

Hyperion:Convoy-clone saalon$ git fsck
warning in tree 5b7ff7b4ac7039c56e04fc91d0bf1ce5f6b80a67: contains zero-padded file modes
warning in tree 5db54a0cdcd5775c09365c19c061aff729579209: contains zero-padded file modes
broken link from    tree 6697c12387f8909cfe7250e9d5854fd6713d25c1
              to    blob 87859f196ec9266badac7b2b03e3397e398cdb18
dangling tree 144becf61ae14cec34b6af1bd8a0cf4f00d346d1
missing blob 87859f196ec9266badac7b2b03e3397e398cdb18

If you notice, there's a broken link from a tree to a blob. What this is saying is that there's a folder that should have a file in it, but there's not actually a file in it. Someone added a file to their local repo and pushed it, but the file itself didn't end up in the remote repo. Now every time someone pulls down the repo themselves, they get the same broken git filesystem link.

The instructions here do a good job of explaining what to do if you get the problem, but in the midst of the actual crisis, I found the description a little lacking in context. It gave a clear list of steps but not a great idea of the why - at least, not for someone who's still a little new to Git.

Basically, what you need to do is figure out what file that missing blob is, track down what computer checked it in last and go to work on their local repo. Their computer has both the SHA1 link to the file and the contents of the file itself. Everyone else has a pile of broken.

So first, we need to find out what blobs/files are in that tree. To do that, you use git ls-tree.

git ls-tree 6697c12387f8909cfe7250e9d5854fd6713d25c1

In my case, that listed only one file: the file that was corrupt. In your case, it might give a whole list of files, in which case what you need to do is match up the blob/file's SHA1 hash to the one mentioned in the broken link error. In my case, it was this:

100644 blob 87859f196ec9266badac7b2b03e3397e398cdb18    short_description.html

Notice that it doesn't give you the directory the file is actually supposed to be in. That's kind of frustrating, but with a little detective work you can find it. The file might be uniquely named, in which case you can just do a find for the file name. Or you can look through your commit history and see when and where a file called short_description.html was placed.

Here's the part the GitFaq wasn't entirely clear on. They say to recreate the file, then run this command:

git hash-object -w db/content/page_parts/venues/86/short_description.html 

But what is that doing?

Basically, when you run git hash-object is returns the sha1 hash for that file. And (and here's the important part) it creates a blob from the file, and a blob was just what we were missing. Here's the part it's not clear on, though: In order for this to work, the file needs to match exactly the file that initially caused the problem. In other words, if that short_description.html file had content in it, you can't just create a blank file and run hash-object. If you do, the blob's sha1 hash won't match the one git is missing, and that broken link will still be broken.

This is why you need to be on the offending machine's repo. Everyone else has a link but not file and no blob. The offending machine (hopefully) still has the original file. In my case, they didn't have the original file (in my flailing, it had been deleted inadvertently), but when I looked at their commit history on their box, the diff contained the content of the file that had been committed but never made it to github. I copied that out, recreated the file and ran hash-object. The next time I ran git fsck, the broken link was gone.

One note: technically, this problem can be fixed on someone else's repo, provided you can recreate the missing file. In my case, I actually had the file created on the offending machine, but had it e-mailed to me and fixed the problem in a clean repo on a different system. The important thing is recreating the file exactly so it generates the same sha1 hash that the repo is missing.

As for the SHA1 collision problem I got when I tried to push to github? This ugly sucker?

fatal: SHA1 COLLISION FOUND WITH 87859f196ec9266badac7b2b03e3397e398cdb18 !
error: unpack failed: index-pack abnormal exit

That was a problem in github's side that they needed to fix.

Zombo
  • 1
  • 62
  • 391
  • 407
saalon
  • 3,644
  • 3
  • 33
  • 40
  • 12
    It's vanishingly unlikely that that was a genuine collision (i.e., two distinct objects with the same SHA1). According to the book [Pro Git](http://git-scm.com/book/ca/Git-Tools-Revision-Selection#A-SHORT-NOTE-ABOUT-SHA-1), "If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (1 million Git objects) and pushing it into one enormous Git repository, it would take 5 years until that repository contained enough objects to have a 50% probability of a single SHA-1 object collision." ... – Keith Thompson Dec 15 '11 at 20:46
  • 23
    ... "A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night." – Keith Thompson Dec 15 '11 at 20:46
9

Just a reminder. A small likelihood of something happening is not the same as it not being able to happen. You can get hash collisions with git's use of sha-1. Once you have two files that collide, the likelihood becomes 100%. At that point, there's slim consolation from the theoretical likelihood. Add a space to one and you'll be fine though.

phorgan1
  • 1,664
  • 18
  • 18
6

I ran into the same issue and ran:

git prune  
git gc  

which mentioned

error: bad ref for refs/remotes/origin/ticketName

so I removed the reference and that fixed the issue:

rm .git/refs/remotes/origin/ticketName
depperm
  • 10,606
  • 4
  • 43
  • 67
  • 1
    Basically worked for me too (I didn't have any local changes so I didn't care about possibly losing data). git prune reported a couple of invalid references I deleted. Git gc reported some "unlink of file ... failed" which I ignored. After that, I could git fetch again... – mmey Apr 22 '16 at 14:00
0

This happened to me recently on "git pull" from AWS Git server. The commands below fix the issue. thanks

git prune git gc

Tin Torres
  • 221
  • 3
  • 3