50

Someone took a version (unknown to me) of Moodle, applied many changes within a directory, and released it (tree here).

How can I determine which commit of the original project was most likely edited to form this tree?

this would allow me to form a branch at the appropriate commit with this patch. Surely it came from either the 1.8 or 1.9 branches, probably from a release tag, but diffing between particular commits doesn't help me much.

Postmortem Update: knittl's answer got me as close as I'm going to get. I first added my patch repo as the remote "foreign" (no commits in common, that's OK), then did diffs in loops with a couple format options. The first used the --shortstat format:

for REV in $(git rev-list v1.9.0^..v1.9.5); do 
    git diff --shortstat "$REV" f7f7ad53c8839b8ea4e7 -- mod/assignment >> ~/rdiffs.txt; 
    echo "$REV" >> ~/rdiffs.txt; 
done;

The second just counted the line changes in a unified diff with no context:

for REV in $(git rev-list v1.9.0^..v1.9.5); do 
    git diff -U0 "$REV" f7f7ad53c8839b8ea4e7 -- mod/assignment | wc -l >> ~/rdiffs2.txt;
    echo "$REV" >> ~/rdiffs2.txt; 
done;

There were thousands of commits to dig through, but this one seems to be the closest match.

Community
  • 1
  • 1
Steve Clay
  • 8,671
  • 2
  • 42
  • 48
  • 2
    If you can find some meaningful test that you can apply to a particular commit of the moodle repo and the initial commit of moodle-rubric to determine whether the latter happened before or after the former, you could use `git bisect` to quickly home in on the source commit. Check out `git bisect --help` for the skinny. – Simon Whitaker Jun 17 '11 at 16:07
  • 1
    Thanks used this now with great success. Used `cat rdiffs.txt | grep -oe '[0-9]* insertions' | sort -n | head -n 10` (and similar) to narrow down the lowest amount of changes without having to do any manual searching. Just mentioning in case anyone finds the additional information handy. – Malcolm MacLeod Apr 05 '15 at 18:45

5 Answers5

18

you could write a script, which diffs the given tree against a revision range in your repository.

assume we first fetch the changed tree (without history) into our own repository:

git remote add foreign git://…
git fetch foreign

we then output the diffstat (in short form) for each revision we want to match against:

for REV in $(git rev-list 1.8^..1.9); do
   git diff --shortstat foreign/master $REV;
done

look for the commit with the smallest amount of changes (or use some sorting mechanism)

knittl
  • 246,190
  • 53
  • 318
  • 364
4

This was my solution:

#!/bin/sh

start_date="2012-03-01"
end_date="2012-06-01"
needle_ref="aaa"

echo "" > /tmp/script.out;
shas=$(git log --oneline --all --after="$start_date" --until="$end_date" | cut -d' ' -f 1)
for sha in $shas
do
    wc=$(git diff --name-only "$needle_ref" "$sha" | wc -l)
    wc=$(printf %04d $wc);
    echo "$wc $sha" >> /tmp/script.out
done
cat /tmp/script.out | grep -v ^$ | sort | head -5
mattalxndr
  • 9,143
  • 8
  • 56
  • 87
  • I ended up having to specify the branch (master), removing --all, because it also searched the needle_ref, resulting in zero diffs. – Kyle Mar 15 '16 at 15:58
  • 1
    As @kyle mentions; this script is good, except the last line, that ends up picking the lowest diff - if the date range includes the checkin we are comparing (ie the needle_ref), then that wins with 0 files different. I recommend changing the last line to: "cat /tmp/script.out | grep -v ^$ | sort | head -5 " - this will show the 5 checkins with the fewest filechanges. – thetoolman Jun 12 '17 at 22:35
1

Some really great solutions here!

I used something similar, to try and find the closet source file revision (given a target file):

  1. iterate backwards through all commits in the branch merge
  2. looking for the closest match with file target.txt
  3. print out the git revision, and the number of differing lines of text

N.B. perform inside a new, throw-away branch - reset --hard is destructive (afaik).

for REV in $(git rev-list merge); do
    git reset --hard "$REV"
    echo "$REV" `comm -2 -3 source.txt ../target.txt | wc -l`
done

You'll get output like the following, which tells you which revision was the closest match (i.e. least differing lines):

1c58bd5925a1fc8233730626**************** 771
HEAD is now at ...
9b2c29b00f1b4541a4135906**************** 775
HEAD is now at ...
b8e0bf5ec4372ebbcbd4edd0**************** 342
HEAD is now at ...
ba0d474bf2aac40dae48923e**************** 342
HEAD is now at ...
6d96921d3e9ad760ce55e76c**************** 335 <-- Closest match
HEAD is now at ...
795cd4caae5a5b08563443c9**************** 396
HEAD is now at ...
8743f42b24dd77e3bcc897dd**************** 399
HEAD is now at ...
d1b74dd33074c17da3fff638**************** 929

Further reading:

  • comm - for outputing differing lines
  • wc - for counting lines of text

Credit:

Nick Grealy
  • 24,216
  • 9
  • 104
  • 119
  • I think this needs `<(sort source.txt) <(sort ../target.txt)`. Sorry I can't test it out right now. – Noumenon Oct 26 '20 at 19:24
0

How about using git to create a patch from all versions of 1.8. and 1.9 to this new release. Then you could see which patch makes more 'sense'.

For example, if the patch 'removes' many methods, then it is probably not this release, but one before. If the patch has many sections that don't make sense as a single edit, then it probably isn't this release either.

And so on... In reality, unfortunately, there doesn't exist an algorithm to do this perfectly. I will have to be some heuristic.

rafalotufo
  • 3,862
  • 4
  • 25
  • 28
-2

How about using 'git blame'? It will show you, for each line, who changed it, and in which revision.

rafalotufo
  • 3,862
  • 4
  • 25
  • 28
  • 4
    this only works for commits with history, and that is exactly the problem here: missing history and branching points – knittl Jun 17 '11 at 16:15