2

So, I inherited a repository where one submodule was removed, and I get the dreaded:

warning in tree 6eb01385fa82fdef80719ec4990bec2e0b591d47: nullSha1: contains entries pointing to null sha1

I tried to fix this using this answer. However this assumes that I know which commit included that tree and I cannot find it:

git log --pretty='%H %T' | grep 6eb01385fa82fdef80719ec4990bec2e0b591d47

... doesn't return anything. Furthermore, using the filter-branch command as the OP of the aforementioned question elicits a complain about a commit, but this is a different tree, and listing this tree with ls-tree also show a pair of null SHA1 entries.

So, in summary:

  • I seem to have at least two trees with null SHA1 entries
  • One is spotted by fsck but doesn't seem to be attached to any commit
  • Another one is is attached to a commit but is not seen by fsck

Maybe I can fix the one belonging to a commit using the aforementioned answer, but how about the orphan tree?

Edit:

Thanks to all suggestions here. Having a copy of the repo on a tmpfs on a fast machine makes testing all this a breeze. Eventually figured out part of the problem:

  • The commit filter-branch complains about (e884a3b0) contains tree e057f815a
  • Tree e057f815a contains just a tree : 6eb01385f
  • Tree 6eb01385f is the tree with two null SHA-1s tha fsck complains about

Now I wonder how to apply the official answer since the troublesome tree isn't a direct child of the commit. As I understand it I should fix/replace 6eb01385f and reinsert that in e057f815a, and maybe then regen e057f815a to insert it in commit e884a3b0. So that would be:

  • git ls-tree {badtree} | sed -e '/0\{40\}/d' | git mktree to fix the bottom tree
  • git ls-tree {parenttree} | sed -e 's/badtree/fixedtree/' | git mktree to make a parent tree pointing to it
  • replace that one in the commit as indicated in that other answer

OK, so tried the hard way:

# Create new tree by removing empty SHA1s
git ls-tree 6eb01385fa82fdef80719ec4990bec2e0b591d47 | sed -e '/0\{40\}/d' | git mktree
0eabc1625026f92b2737e763a087f7c4000f0084

# Create new parent tree by replacing bad tree by fixed tree in parent tree
git ls-tree e057f815aec33a48981921289fc7ab25e9ea1a16 | sed -e 's/6eb01385fa82fdef80719ec4990bec2e0b591d47/0eabc1625026f92b2737e763a087f7c4000f0084/' | git mktree
df56fe08e90f1a30e6467ac2bba50a3d771c9de4

# Create new commit by replacing old parent tree by new parent tree
git cat-file commit e884a3b0040b3940d259cd72d82be20d5eb8d7c3 | sed 's/e057f815aec33a48981921289fc7ab25e9ea1a16/df56fe08e90f1a30e6467ac2bba50a3d771c9de4/' | git hash-object -t commit -w --stdin
b41674793c985ba63bc68b095024ebcb2fbf0370

# Replace old commit by new commit
git replace e884a3b0040b3940d259cd72d82be20d5eb8d7c3 b41674793c985ba63bc68b095024ebcb2fbf0370

So far so good. But the old commit and trees are still there. And if I rey to remove them with:

git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch Somedir1 Somedir2' --prune-empty --tag-name-filter cat -- --all

It complains that I should use "-r", so I use:

git filter-branch --force --index-filter 'git rm -r --cached --ignore-unmatch Somedir1 Somedir2' --prune-empty --tag-name-filter cat -- --all

which runs... but then the submodules have been replaced by directories that have the same name at the same place, so the above also drops a lot of useful files. And fsck still finds the bad tree, and in addition it finds many "dangling tags". Is there a way to just remove the two bad trees and the commit?

xenoid
  • 8,396
  • 3
  • 23
  • 49
  • Add `--all` to your `git log` command to view all reachable commits; but this may also not find it, if the null hash is in a tree that is not a top level tree. Use a recursive tree searcher if necessary to find such sub-trees. – torek Jul 10 '17 at 00:13
  • Didn't suffice and indeed the tree isn't a top-level tree. But thanks, still a step in the right direction. – xenoid Jul 10 '17 at 13:10

1 Answers1

1

Brute force will probably do the trick:

which-commits-use-tree () 
{ 
    local REPLY;
    git rev-list --all --reflog | while read; do
            git ls-tree -dr $REPLY | grep -q $1 && echo $REPLY uses $1;
    done
}

That's sort of tolerable as a one-off on medium-ish repos, it scanned the whole of git's history in about five minutes on my little system. If you've got anything substantially larger, you'll need patience or something heavier duty.

git cat-file --batch-check='%(objectname) %(objecttype)' --batch-all-objects --buffer \
| awk '/commit|tree/{print $1}' | git cat-file --batch | your-scanner-here

is about the fastest way I can think of to dump the entire history structure for bulk scanning, that took six seconds on the git history; on the linux repo it took about 2m30, that's reasonably encouraging. I'm probably not going to write the scanner for this, though.

jthill
  • 55,082
  • 5
  • 77
  • 137
  • OK, so I guess that with this answer I figured out the problem, and edited my original question. Still need a bit if help for the "fix" part. – xenoid Jul 10 '17 at 13:11
  • 1
    What happens if you do `git replace --edit 6eb01385fa82fdef80719ec4990bec2e0b591d47` deleting the bad entry followed by `git filter-branch --index-filter : --tag-name-filter cat -- --all`? – jthill Jul 10 '17 at 13:54
  • I get `error: object 0eabc1625026f92b2737e763a087f7c4000f0084 is a tree, not a commit error: object 0eabc1625026f92b2737e763a087f7c4000f0084 is a tree, not a commit fatal: ambiguous argument 'refs/replace/6eb01385fa82fdef80719ec4990bec2e0b591d47^0': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git [...] -- [...]' WARNING: Ref 'refs/replace/6eb01385fa82fdef80719ec4990bec2e0b591d47' is unchanged`. `ls-tree` doesn't show the null SHA'1s, but `fsck` still finds them. – xenoid Jul 10 '17 at 14:36
  • Did it rewrite the rest of the history? filter-branch keeps refs to the original history under refs/original, so fsck is still going to find the original bad objects until you delete those and repack. – jthill Jul 10 '17 at 14:48
  • Yes, rest seems to have been rewritten. How do I delete/repack? – xenoid Jul 10 '17 at 14:55
  • Okay, the error was just filter-branch trying to treat the replace ref as a branch, you can ignore that one. You can push the rewritten refs where you like, they should be fixed. You can also `git for-each-ref refs/original refs/replace --format='delete %(refname)'|git update-ref --stdin; git repack -ad` to do it locally. – jthill Jul 10 '17 at 15:04
  • Still problems with 'fsck' – xenoid Jul 10 '17 at 15:21
  • Actually, the tree I edited has been replaced by its original version... (this is caused by the `for-each-ref`... – xenoid Jul 10 '17 at 15:38
  • Tried some more things. I may need a way to selective remove one given commit. – xenoid Jul 10 '17 at 21:30