Search git for a specific version of a file

Question

I have a file that was copied from a git repository at some point. I want to find what SHA the file was copied at.

Currently I'm going get a list of all the SHA's and diff the file for each SHA with the file I have, but I was hoping for a better way.

A file with one particular name can be in many, or even every, commit. Each commit has a copy of the file. Each commit may be sharing the *same* copy of the file, or some, or all, commits might have *different* copies of that file by that name. What precisely do you want out of Git? There's a relatively fast way to find every commit in which that file exists *and* exactly matches the copy you have, but of course there is no guarantee that any such commit exists. — torek, Jan 08 '20 at 03:25
Theoretically, you can calculate the SHA of your file, for which it should corresponds to the SHA of certain BLOB in Git repo. With the blob SHA, you can find out which commits contain this blob, by https://stackoverflow.com/questions/223678/which-commit-has-this-blob . This should help you to identify the first or last commit that contain such file. However if you want to find out a specific commit that you copy the file from, it is almost impossible. — Adrian Shum, Jan 08 '20 at 03:29
To get the hash some file *would* have if inserted into a Git repository, use `git hash-object`. — torek, Jan 08 '20 at 03:37
@torek i think OP would probably like this « relatively fast way » you mention (and I’d be interested in seeing it) — D. Ben Knoble, Jan 08 '20 at 03:55
@D.BenKnoble: see Adrian Shum 's comment link (or "which commit has this blob", the link shows up on the right sidebar in my browser). The Perl code is the fast one; it memo-izes the result of the tree searches, so commit searches wind up being hash table lookups. That will match regardless of the actual committed file name; you can filter that down, or write a slightly different variant. — torek, Jan 08 '20 at 04:02
Interesting. Does find-object work with a blob? I.e. `git log —find-object=$(git hash-file ...)` — D. Ben Knoble, Jan 08 '20 at 04:06
@D.BenKnoble: Aha, I hadn't seen the `--find-object` option. It was first introduced in Git 2.17. It acts like `-S` so it won't find every commit that contains the object, but rather each commit that adds or removes a copy of the object. — torek, Jan 08 '20 at 04:22
@torek True, as seen in https://stackoverflow.com/a/48590251/6309 — VonC, Jan 08 '20 at 05:05

score 3 · Accepted Answer · answered Jan 08 '20 at 06:15

To me it looks like the following situation:

Say, we are talking about file foo.txt that was added to git repository (say, empty) at some point:

touch foo.txt
git add foo.txt
git commit -m "added foo.txt"
// commit '123'

Then it was changed a couple of times:

// add "hello" to foo.txt
git add foo.txt
git commit -m "Added 'hello'"
// commit '456'
// add "world" to foo.txt
git add foo.txt
git commit -m "Added 'world'"
// commit '789'

At this point the file (that contains 'hello world') gets copied aside (cp foo.txt /tmp/foo.txt) and its the reference copy.

Afterwards there were other commits that altered the content of foo.txt, say commits abc, def) so that the file now actually looks in git like:

Hello world
How are you doing?
I'm fine, and you?
I'm Ok too

So the question that is actually asked is how to find the commit after which the file foo.txt looks exactly like a reference copy, as its stored in /tmp/foo.txt: it contains only "Hello world" (commit 789 in my example)

In this case I believe you should use git bisect command, give it boundaries of the last commit and the first (initial commit) and, if there are were many commits in the project, running a binary search will be much faster than iterating over all the commits in the history of the file.

Read about it here

You should know how to take a decision whether the commit is 'good' or 'bad' in terms of git bisest (e.g. whether it contains the words 'hello' and 'world' for example)

I believe `git bisect` can even take a command to run, so you could do `git bisect diff ` or whatever the syntax is — D. Ben Knoble, Jan 08 '20 at 13:40

score 1 · Answer 2 · answered Jan 08 '20 at 06:29

git hash-object path/to/file will show you the blob id for a file in your work tree.

git log --no-abbrev --full-history --cc --raw --pretty=format:%H -- path/to/file will dump the old and new blob ids at every change to that file in your history.

git log --no-abbrev --full-history --cc --raw --pretty=format:%H -- path/to/file \
| awk '$5==blob' RS='' blob=`git hash-object path/to/file`

will show you the commit(s) that produced the file you've got.

score 0 · Answer 3 · answered Jan 08 '20 at 05:52

It seems you can use git describe for this.

git describe <sha>

From the git describe man page:

If the given object refers to a blob, it will be described as <commit-ish>:<path>, such that the blob can be found at <path> in the <commit-ish>, which itself describes the first commit in which this blob occurs in a reverse revision walk from HEAD.

To get the <sha> of the file, use

git ls-files -s <file>

Like Adrian Shum points out in the comments, this will only give you the first commit the file is in. You can use git log to help you find the next commit at which the file changes, giving you a range of possible commits.

For more info:

git describe: Which commit has this blob?
git ls-files: Git - finding the SHA1 of an individual file in the index

Search git for a specific version of a file

3 Answers3