@LeGEC provided the final pieces for me to get this together but I think it's worthwhile to present the full approach I used. Note: I expect that a lot of the things I was able to do are specific to my case BUT there are some things that can be generalized.
When looking at the results of git fsck
I found that there were several dangling commits. When I checked out those hashes, I found segments of good commits. So a repository which had an original structure of
(a)->(b)->(c)->(d)->(e)->(f)->(g)->(h)->(i)->(j)
after the, lets call it, "ill advised," rm
command might be left in a state like
(b)->(c) (e)->(f) (h)->(i)->(j)
As stated in the question, the backup was very old and had the form
(a)->(b)
but that's it. What one can do is to use git replace
to try and solve this problem. BE WARNED git replace
seems to be an excellent tool to truly destroy your repository. I did this on a copy of my original repository and I am VERY glad it wasn't the real deal!
We will build our new repository on a new (good) foundation. We first initialize a fresh repository from the backup we do have.
$ mkdir my/new/fixed/repository
$ cd my/new/fixed/repository
$ git init
Now, from our backup (which doesn't cover the full space of the corrupted repository) we will unpack the existing structure such as it is.
$ git remote add origin /path/to/backup/repository
$ get remote fetch
$ get checkout --track my-broken-branch # This may not be necessary
To avoid messing anything up with our corrupted repository, we make a copy
$ cd /path/to/repository/root
$ mkdir repository-copy
$ cp -R /path/to/broken/repository /path/to/repository-copy
$ cd /path/to/repository-copy
First things first, lets try to use our previous repository to fix what we can:
git remote add backup /path/to/backup/repository
git unpack-objects < /path/to/backup/repository/.git/objects/pack/pack-*.pack
Okay, lets see what the damage is:
$ git fsck
broken link from commit <SHA1>
to commit <SHA2>
broken link from tree <SHA3>
to blob <SHA4>
...
dangling commit <SHA5>
...
missing commit <SHA2>
...
missing blob <SHA4>
...
dangling commit <SHA6>
...
Of interest are the dangling commits because those are likely to be the little sub-branches that we want to try and stitch back together. Note, these commits are NOT always in chronological order. For me the order happened to be (from oldest to newest) <SHA5>-<SHA6>
but you will likely have your own knot to untangle. You can check the commit date/time by running
$ git show -s <SHAX>
One thing to note at this point is this, if you are in the broken repository copy, and then run the command git log
you will be able to traverse the repository until you run into at which time you will get the error:
error: Could not read <SHA2>
fatal: Failed to traverse parents of commit <SHA1>
So we need to replace the parent of with a commit that is actually good. The pattern for this is called a graft but doing a pure graft is no-longer considered best practice (How do git grafts and replace differ? (Are grafts now deprecated?)) because of the new(er) best practice git replace
.
So I now make the parent of
$ git replace --graft <SHA1> <SHA6>
$ git fsck
broken link from commit <SHA1>
to commit <SHA2>
broken link from tree <SHA3>
to blob <SHA4>
...
broken link from commit <SHA7>
to commit <SHA8>
So a new broken commit has appeared. If I investigate that commit using git log
I find that the previous commit ended prior to the remaining dangling commit's commit time. So I'm going to graft those two together. Note, this may not be a safe thing to do if you have lots of people working on this repository but, in this case, I believe it to be okay.
$ git replace --graft <SHA7> <SHA5>
$ git fsck
broken link from commit <SHA1>
to commit <SHA2>
broken link from tree <SHA3>
to blob <SHA4>
...
broken link from commit <SHA7>
to commit <SHA8>
No new dangling commits and, in my case, was able to connect to my backup repository. In other cases I imagine this will not always be true. If so, you can eventually get to the point where you could graft the head of the remote repository as the remaining bad commit link.
Now we must deal with the missing blobs. You can try and repair them following Linus' method or, if you are willing to accept the missing history, you can use git replace again to excise them from the history. The general approach is
$ git ls-tree <SHA3>
...
100644 blob <SHA4> my-magic-file
...
$ git log --raw --all --full-history -- subdirectory/my-magic-file | grep -B 20 -A 20 "<SHA4>" # May just need to use first few values from SHA4
# commit information after missing blob
# commit information for missing blob
# commit information before missing blob
$ git replace --graft <commit-after-missing-blob> <commit-before-missing-blob>
Repeat this until git rev-list --objects my/branch
runs to completion.
Now, you need to remove the extraneous commits. Fortunately, a new tool has been developed to do just this: git-filter-repo
. This tool will commit our grafts and refactor the history.
$ git filter-repo --force
$ git fsck
Checking object directories: 100%...
Checking objects: 100%...
Now lets see if we can successfully fetch our repository from our broken branch.
$ cd /path/to/my/new/fixed/repository
$ git fetch broken my/branch
...
From /path/to/my/broken/repository
* branch my/branch -> FETCH_HEAD
* [new branch] my/branch -> broken/my/branch
And, because we have a common history with the remote we can now merge with our previously broken branches
$ git merge broken/my/branch
And the history is once again clean.