20

In trying to mirror a repo to a remote server, the server is rejecting tree object 4e8f805dd45088219b5662bd3d434eb4c5428ec0. This is not a top-level tree, by the way but a subdirectory.

How can I find out which commit(s) indirectly reference that tree object so I can avoid pushing the refs that link to those commits in order to get all the rest of my repo to push properly?

Andrew Arnott
  • 80,040
  • 26
  • 132
  • 171
  • I've considered deleting the tree object then running `git fsck` hoping it would remove all references to it as part of recovery. But I don't know how to delete an object from a packfile either. – Andrew Arnott Dec 11 '16 at 15:58
  • How about finessing the problem: Use "git bisect" to find the commit that introduced the bad tree reference, and then you can git ls-tree that commit to find the bad tree. – Raymond Chen Dec 11 '16 at 18:03
  • @RaymondChen That might not work. Besides taking so long (bisect is awesome, but not so much on a tree this large) it may fail because the tree itself may fail to checkout on the relevant commit. Also, I need a "good" and a "bad" sample commit for bisect to get started, and I don't know which commit is bad. – Andrew Arnott Dec 12 '16 at 00:16

2 Answers2

24

As you noted, you just need to find the commit(s) with the desired tree. If it could be a top level tree you would need one extra test, but since it's not, you don't.

You want:

  • for some set of commits (all those reachable from a given branch name, for instance)
  • if that commit has, as a sub-tree, the target tree hash: print the commit ID

which is trivial with two Git "plumbing" commands plus grep.

Here's a slightly updated version of my original script (updated to take arguments, and default to --all as in badp's edit):

#! /bin/sh
#
case $# in
0) echo "usage: git-searchfor <object-id> [<starting commit>...]" 1>&2; exit 1;;
esac

searchfor=$(git rev-parse --verify "$1") || exit 1
searchfor=$(git rev-parse --verify "$searchfor"^{tree}) || exit 1
shift
  
git log ${@-"--all"} --pretty='format:%H' |
    while read commithash; do
        if git ls-tree -d -r --full-tree $commithash | grep $searchfor; then
            echo " -- found at $commithash"
        fi
    done

To check top-level trees you would git cat-file -p $commithash as well and see if it has the hash in it.

Note that this same code will find blobs (assuming you take out the -d option from git ls-tree). However, no tree can have the ID of a blob, or vice versa. The grep will print the matching line so you'll see, e.g.:

040000 tree a3a6276bba360af74985afa8d79cfb4dfc33e337    perl/Git/SVN/Memoize
 -- found at 3ab228137f980ff72dbdf5064a877d07bec76df9

To clean this up for general use, you might want to use git cat-file -t on the search-for blob-or-tree to get its type.

As jthill notes in a comment, git diff-tree now has a --find-object option. This was introduced in Git 2.17 (released in 2018, well after the original question here). The git log command has this as well, but we're usually more interested in which specific commit added a file or tree. By removing the extra line that tries to force the searchfor hash ID to be a tree, we can get a much faster script that finds either every occurrence of any tree or blob object (though you must take care to specify the correct hash ID or use the ^{tree} suffix yourself if you're going to supply a commit hash ID). Then we just run:

git log --all --find-object=$searchfor

or, as in the comment below:

git rev-list --all | git diff-tree --stdin --find-object=$searchfor

to find what we're looking for. (Add ${2-"--all"} if/as desired.)

torek
  • 448,244
  • 59
  • 642
  • 775
  • Thanks! This should work, albeit since I don't know which branch/tag has the bad tree, I'd have to run the whole thing in a loop over each one of my several thousand branches (it's a big repo with lots of users). So I'll have to craft some way to get a list of every single commit in the repo across branches and eliminate duplicates first. But this is a great start. – Andrew Arnott Dec 12 '16 at 00:33
  • It looks like `git rev-list` returns a set of commits for any number of refs. And it takes `--stdin` as a parameter. So I can do `git branch -r | git rev-list --stdin` and otherwise keep using your script. :) Except `git branch` adds whitespace in front of each branch name, which `git rev-list` doesn't like, so I wrote to a file, cleaned it up, then piped it into your script. Now it's very busy searching. – Andrew Arnott Dec 12 '16 at 00:42
  • I was even able to change `origin/master` to `^origin/master` to greatly cut down the number of commits since I know that the tree in question isn't anywhere on the master branch. – Andrew Arnott Dec 12 '16 at 00:43
  • 1
    `git rev-list` takes the *same* arguments as `git log`. In fact, they're basically the same command! They are built from one source file that just changes the default settings when run as `git log` vs `git rev-list`. Rev-list is intended for use in scripts, though, while log is intended for use by humans. In any case `A..B` "means" `B ^A` so `origin/master..master` and `master ^origin/master` are exactly the same thing here. In this case you can use `git rev-list --branches ^origin/master` (or maybe `--branches --tags`). – torek Dec 12 '16 at 02:50
  • 1
    So, it worked! I did find that the script subtly would only find trees that are subdirectories of the current one (as opposed to starting at the root). I had fixed that, but then it took so long to complete, and I had a pretty good idea of which directory the tree represented so I took advantage of that as an optimization. I got several commits now to work with. :) – Andrew Arnott Dec 12 '16 at 03:57
  • Oh, right, `git ls-tree` likes to start from your current subdirectory (not sure why) and we need `--full-tree` to prevent that. – torek Dec 12 '16 at 07:39
  • faster: `git rev-list --all | git diff-tree --stdin --find-object=4e8f805dd45088219b5662bd3d434eb4c5428ec0` to find all commits introducing that tree, then `git branch --contains` those commits to avoid pushing anything with it. – jthill Oct 26 '22 at 13:04
  • @jthill Right, `--find-object` (new in Git 2.17) is now the way to go. I'll tweak the answer a bit. – torek Oct 26 '22 at 23:36
2

Variation of great answer by torek in case you want to speed things up via GNU Parallel:

#!/bin/bash    
searchfor="$1"
startpoints="${2-HEAD}"

git rev-list "$startpoints" |
    parallel "if git ls-tree -d -r --full-tree '{}' | grep '$searchfor'; then echo ' -- found at {}'; fi"
shawkinaw
  • 3,190
  • 2
  • 27
  • 30