1

I have yet another example of doing a git rm -rf without an initial commit. (I realized I had added lots of useless files and wanted to add some filters.)

Now I am left with 23000 dangling blobs with no tree, but with a complete Git history!

I'll use a script to loop over the blobnames (using git show 'blobname' > 'filename'), but can I associate these filenames from the history to the blobs?

Palec
  • 12,743
  • 8
  • 69
  • 138
PaulDj
  • 31
  • 4
  • hmm, I originally thought you typed `rm -rf`. You can still get it back, you need to find the root tree object. – Alex Brown Aug 31 '12 at 21:59
  • check the type of those objects - are the ALL blobs?? or is there a tree in there anywhere? Either way you can do `git cat-file ` to see what it contains for comparison. Also have a look at their date stamps in case that gives you a clue. – Philip Oakley Aug 31 '12 at 22:02
  • @Alex: how do I find the root tree object? is that in one of the blobs? – PaulDj Sep 01 '12 at 09:15
  • @Philip: indeed, they all are blobs. I can use `git cat-file`, but to compare with what? I only have the list of filenames. – PaulDj Sep 01 '12 at 09:18
  • 1
    Sounds a bit like you've got real problems. I hadn't realised that `git add` didn't create a local tree but waited till the commit for that, leaving details in the index. Unfortunately the 'git rm` clears all that from the index. I guess its time to triage the blobs into bin, ascii and utf8, to try to reduce the list size - 23,000 is a lot of crud. – Philip Oakley Sep 02 '12 at 14:10

1 Answers1

2

For all of you who did/will do the exact mistake I made, here's the end of the story.

First off, a brief summary of what I did.

  1. Created an empty repository
  2. moved many files/directory to it
  3. gid add .
  4. realized that I just added a TON of useless/not-so-important/redundant files
  5. git rm -rf with the intent of then adding some filters in .gitignore
  6. realized that all my files were gone...

I tried all sort of data recovery tools; no luck. The best I could do was the following procedure.

  1. Immediately copy the working directory to a different volume (external HD).
  2. git fsck --lost-found possibly with --unreachable --cache
    This creates the folder .git/lost-found/other with all (most of?) the original files were re-created, but without filenames. Now the problem was how to recover the file names. Unfortunately, all the files I recovered were blobs, no roots, so I had no information about the tree structure of the directories.
  3. Even though I had the complete list of lost filenames (only names, not sizes), I could not find any root, so this information was basically useless.
  4. In general, one can write a script that uses file to look at the type of a file (file <filename>), and attaches the corresponding extension to it. The problem of matching files with filenames still remains.
    Alternatively, one can use brute force. For instance, to recover pdfs, I sorted the recovered files by length, attached a .pdf extension to them, and looked at them one by one. The files that were actual pdfs show something, the others don't.
  5. To recover text-based files (txt, tex, c, h..), I used grep, looking for a string that I remember belongs to a specific (group of) file(s).
  6. Now I keep the directory with all the lost-recovered files, and every time I need one of them, I use a slight variant of bullet 4.

Good luck!

PaulDj
  • 31
  • 4