find out which git commit a file was taken from

Question

A collaborator that does not use version control has sent me a file with some local modifications. Now he went on vacation. I would like to find out which version his edits were based on.

I suppose the most promising approach is to somehow iterate over the most recent commits and check for which the length of the diff is minimal.

Is there some ready-made functionality for this, or would I need to code this myself? Not being a git expert myself, what would be the most promising way to proceed?

Wait until they come back from vacation, admonish them for not giving full context into what it was they modified and where it came from, and then go from there. — Makoto, Jul 15 '15 at 09:35
While this is certainly a sensble suggestion, waiting ~1 month doesn't help me in the current state of affairs. — carsten, Jul 15 '15 at 09:38
Depending on the nature of the changes, it may not be terrible to create a branch off of what is currently out in the wild (master) and see what collides. Unless the changes that they did were mission critical, there really shouldn't be that much of a hurry to get them in - but far be it from me to dictate what "hurry" you'd be in. Personally, if they *were* critical changes, not providing the full context of where they came from would be a very serious issue, as it hamstrings the maintainers further from getting the proper fix out in time. — Makoto, Jul 15 '15 at 09:41
The issue is that the collaborator in question is one of the senior project developers. It's highly likely that the changes don't actually break anything, the main problem is to merge those changes with what happened on the trunk in the mean time. — carsten, Jul 15 '15 at 09:44
Oh and the changes were designed to extend the core functionality of the project such to facilitate further development by others, which means that as long as these changes are not merged to the trunk, development is stalled. — carsten, Jul 15 '15 at 09:48
You could (ab)use [`git bisect`](http://git-scm.com/docs/git-bisect) for automating this task. — jub0bs, Jul 15 '15 at 10:04
@Jubobs: `bisect` requires the searched range to have monothonic metrics ("score"). With file-comparison and metrics like "diff length" you cannot be sure of that, and bisect may simply lie to you. — quetzalcoatl, Jul 15 '15 at 10:41
@carsten: If senior made some changes and sent you the updated file as a separate file/note/attachment/etc, then there's a chance he worked on an offline machine and that he still has some "backup copy" (i.e.: zipped project on email box or pendrive, etc) of the original code on which he had started to work on that fix. Otherwise he'd simply commit the file/project somewhere and would told you to merge branches or repos. Could you ask him to send the original unchanged file? This would make finding originating commit much easier. — quetzalcoatl, Jul 15 '15 at 10:44
A senior developer who does not use version control ... the world is #@%$ed. — Jonathan Wakely, Jul 15 '15 at 11:02
@quetzalcoatl: Yes, you are right, and yes, I could, if he wouldn't be camping for 4 weeks somewhere without network cocnnection. — carsten, Jul 15 '15 at 12:22

Alderath · Accepted Answer · 2015-07-15T14:50:54.220

I do not know about any standard git command to do this. But a simple script could aid this task. First, create a tmp-branch and commit the file to this branch. Then create a simple script like the one below to print how much different the file is from each of the 50 most recent versions of that file.

#!/bin/bash

BRANCH="tmp-branch"
FILE="path/to/file.txt"
RECENT_COMMITS=$(git rev-list -50 master -- $FILE)

for COMMIT in $RECENT_COMMITS
do
    echo -n "$COMMIT: "
    git diff $BRANCH $COMMIT --shortstat -- $FILE
done

Not fully automatic, but it would give you output like the following. In this output you identify the version with the least changes. In my example, the simplistic change i used as an example was based on edff0c0.

e2b2c157a81e0523e7d4a0a52df79cb4fce981ac:  1 file changed, 12 insertions(+), 16 deletions(-)
154d84736f4df3dd968450599dc254cda56f2057:  1 file changed, 12 insertions(+), 13 deletions(-)
ba11ecc3a4d8268f43589fb929f0877e65879f13:  1 file changed, 11 insertions(+), 13 deletions(-)
017a7a5abdffeb37671a03c0db2e32c37b0ee6bd:  1 file changed, 8 insertions(+), 9 deletions(-)
cc97d3453ebde37b02a42ca7263bf7a983222d4d:  1 file changed, 8 insertions(+), 5 deletions(-)
a84adb9e337d2cf1e851924cf27f5f0bfdca790f:  1 file changed, 7 insertions(+), 4 deletions(-)
9a3c10cefc133792377851b1b5cb8a69d3ffd788:  1 file changed, 7 insertions(+), 3 deletions(-)
edff0c0155b77e39599402574ba1c4aa02c1bbac:  1 file changed, 6 insertions(+), 2 deletions(-)
413800ab0de606548c0c69b4b35e50b527d33d7f:  1 file changed, 13 insertions(+), 2 deletions(-)
af689f1d6d76303d8e39311f48a977b87260586e:  1 file changed, 13 insertions(+), 2 deletions(-)
25123d4196533a0f3ce718a288bc3c5d975ad865:  1 file changed, 24 insertions(+), 3 deletions(-)
e7ca01b247f7e32010f256b55696c3ecb1d72144:  1 file changed, 26 insertions(+), 5 deletions(-)
6e9c2a561cc606f34ccb2cc918b297187c2e8c42:  1 file changed, 33 insertions(+), 23 deletions(-)

I'm not sure if this method is foolproof. You should probably have a look at a couple of the surrounding commits also.

the `echo -n` is broken, because `git` starts printing at column zero, and the commit ids are overwritten. — Felipe Alvarez, Oct 17 '17 at 05:21
@FelipeAlvarez I guess that somehow, the behavhiour must differ between different environments then. In my environment I get the correct output (Ubuntu 15.04, git 2.1.4). For that case, I guess an alternative would be to save the output of the git diff command in a variable, and output the commit id variable and the diff variable from the same echo command. — Alderath, Oct 17 '17 at 11:12

score 2 · Answer 2 · answered Jul 15 '15 at 10:32

2

I'd create a new branch with the colleague's change and then use git merge-base:

git merge-base finds best common ancestor(s) between two commits to use in a three-way merge. One common ancestor is better than another common ancestor if the latter is an ancestor of the former. A common ancestor that does not have any better common ancestor is a best common ancestor, i.e. a merge base. Note that there can be more than one merge base for a pair of commits.

answered Jul 15 '15 at 10:32

Jiri Kremser

12,471
7
45
72

3

I think this does not do what he wants. I think git merge-base searches the **tree of commits** and finds which "ancestor" is the best. This means, that it will be very sensitive on where-on did you actually create the fake new branch. If you create it off the master-head the results will be different than if you create it off some very old commit. I think merge-base would work the best if the OP knew which commit he has a file from and when he created a branch directly on that commit. I think this will completely not work in this case. However, I have not tried. I'd be very glad if I'm wrong! – quetzalcoatl Jul 15 '15 at 10:39

find out which git commit a file was taken from

2 Answers2

Linked