5

Suppose I have a file that may already be in a git repository, and it might reside under multiple pathnames, even in the same commit.

How do I find all commits containing a blob with a hash corresponding to that file, and list those commits along with the pathname(s) under which the file resides in each?

Is there a find-file-by-hash technique that also searches the index and working directory?

lionel
  • 231
  • 2
  • 7
  • I think the hashes define commits not single files... – Riccardo Petraglia Sep 20 '16 at 19:18
  • 1
    @RiccardoPetraglia, that's wrong: hashes identify everything Git stores, and it stores objects of the three types: blobs (contents of files), trees and commits. Commits reference trees and their parent commit(s); trees reference other trees and blobs. All this "referencing" happen by the SHA-1 names (those "hashes") of these objects. – kostix Sep 20 '16 at 19:28
  • 2
    Possible duplicate of [Which commit has this blob?](http://stackoverflow.com/questions/223678/which-commit-has-this-blob) – kostix Sep 20 '16 at 19:30
  • 1
    You just need to correctly formulate the question ;-) In the Git's parlance, the contents of a file is called "a blob", so you need to search for "which git commit contains blob". See the already answered question I linked to. – kostix Sep 20 '16 at 19:31
  • @kostix Thank you! I found this, I don't know if can be helpful: `find .git/objects -type f | grep ` – Riccardo Petraglia Sep 20 '16 at 19:35
  • @RiccardoPetraglia, no, sorry, that's outright lame: Git objects might be stored elsewhere. The thread I linked to in my another comment contain sensible approaches--please consider reading it. – kostix Sep 20 '16 at 21:33
  • `git ls-tree -r | grep ` can help. – ElpieKay Sep 20 '16 at 23:20
  • 1
    @kostix I'm all for getting my terminology right, but are the contents of a file really called a blob? A blob consists of a header plus the contents of a file, which leads to a recursive definition if the contents of the file are also called a blob !-) – lionel Sep 21 '16 at 07:44
  • @kostix Thank you for that reference. It's a helpful step in the right direction. However, I really am interested in the "round trip" of finding all of the places a file's contents already exist in a repository, meaning the commits/pathnames, not just the commits/trees that reference the right blob. – lionel Sep 21 '16 at 07:57
  • 1) No, the blob contains no headers; only commits do. Please see my other comment directed to @RiccardoPetraglia for the terminology breakdown. Actually the _original_ contents are not blob; to "obtain" a blob, you do `git hash-object -w filename` which a) actually produces a blob out of the file's contents; b) writes that into the object store (if not already there); c) prints the blob's SHA-1 name. Still, the only difference is that a blob only contains data, it is usually compressed and contains no metadata attached to it. – kostix Sep 21 '16 at 10:59
  • @kostix Since the terminology isn't at the heart of the question, I asked a separate question to explain why a file's contents aren't _called_ a blob, they're _represented_ by one (and, as it happens, the blob contains a header that I explain further): [Is “blob” a synonym for the contents of a file I put in a git repository?](http://stackoverflow.com/questions/39627054/is-blob-a-synonym-for-the-contents-of-a-file-i-put-in-a-git-repository/39627127#39627127) – lionel Sep 22 '16 at 02:15
  • @kostix I disagree with calling this a duplicate. If it were, I would find the other question an answer to what I asked. – lionel Sep 24 '16 at 01:13

2 Answers2

2

OK, to expand on the accepted answer

As to finding all commits with pathnames, then the only thing the script in the accepted answer does not do for you is printing the pathname. But fear not—it's easy to modify.

If you go to a nearby Git repository and run git ls-tree -r HEAD you'll see that this command dumps the whole tree hierarchy referenced by the named commit (HEAD in our case)—with both SHA-1 names and "normal" filenames. The script from the answer just greps this output to find the SHA-1 name and ignores the rest.

So we can modify it to read:

#!/bin/sh
obj_name="$1"
shift
git log "$@" --pretty=format:'%T %h %s' \
| while read tree commit subject ; do
    git ls-tree -r "$commit" | while read _ _ sha name; do \
      if [ "$sha" == "$obj_name" ]; then
        echo "$sha\t$name"
        break
      fi
    done
  done

…and it will now also print the name of the file associaled with the target blob along with the commit name.

Community
  • 1
  • 1
kostix
  • 51,517
  • 14
  • 93
  • 176
  • This example only gives the basename, not the path. Is it clear yet that what you're calling a duplicate covers a mere 50% of what I asked? And that's only when you read a non-accepted answer to get a clue how to also search the index (albeit not the working directory, though that I admit lacks the practical application of searching everywhere managed by git). – lionel Sep 22 '16 at 08:05
  • @lionel, pardon me, but how then do you define "basename"? I mean, pathnames shown by `git ls-tree -r ` are relative to the root of the repository this commit is in, which is IMO a natural (the only one possible, even) thing. What did you expect instead? Maybe it will be more clear then. – kostix Sep 22 '16 at 10:25
  • My mistake: the "-r" flag does make ls-tree output the path (more than just the basename). However, to find **all** occurrences, don't you also need the `--full-tree` option? – lionel Sep 24 '16 at 01:06
  • Another thing missing from what you're calling the "accepted answer" is picking up any instances from the index. – lionel Sep 24 '16 at 01:10
1

You can probably find the answer here (duplicate).

Summarizing:

git rev-list <commit-list> | \
xargs -n1 -iX sh -c "git ls-tree -r X | grep <SHA1> && echo X"

You can use --all instead of <commit-list> to look in all the commit.

Community
  • 1
  • 1
Riccardo Petraglia
  • 1,943
  • 1
  • 13
  • 25