2

Consider the following situation:

I have the source code of a project (several folders and files). Unfortunately, the source code is not under version control. However, there is a remote git repository for this project.

How can I find the commit in the git repository, that corresponds to my revision of the project?


My brute-force approach would be: clone the repository, checkout at different commits and compare the checked out files with my files.

Is there a more elegant way to find the commit?

sergej
  • 17,147
  • 6
  • 52
  • 89

2 Answers2

4

I presume the two obvious approaches aren't available (looking at a VERSIONS/README file, or looking for a version number in the tar.gz file you downloaded).

You could try the following. It will be significantly faster because it avoids the checkout. It would only work if you have not modified the source in any way.

  1. git init a new repository
  2. git add the entirety of your copy of the source and then git commit
  3. git rev-parse HEAD^{tree} to get the tree-SHA the code corresponds to

Now git clone the remote, and then in that repo execute a script like

# This should be the tree-SHA given by the above 
needle=ABCDEF12345
# you could limit this to just HEAD or a branch if you had a guess 
for rev in $(git rev-list --all)
do
   if [ $(git rev-parse ${rev}^{tree}) = ${needle} ]
   then
      echo "Sha ${rev} matches"
   fi
done

Once you have a matching SHA you could try to get a release for it with git tags --points-at

Andrew C
  • 13,845
  • 6
  • 50
  • 57
2

Yes, you can take advantage that git IDs are checksums for objects. A commit checksum is the contents of the directory, which is good, but they're also the log message, date of the commit, and other things you can't reproduce.

Git also stores the contents of the directory (ie. all the files and directories) as a tree object. That also has an ID which is a checksum of the contents. Same contents, same checksum. You can reproduce this checksum. Here's how.

Initialize a new repository in your mystery directory. Add and commit all the files. Then check its tree hash with:

git log --pretty=format:'%T'

Then back in your real repository, search your history for a commit with the same tree.

git log --pretty=format:'ID: %H Tree: %T %s'

The caveat is that the mystery files have to be exactly the same and it has to have all the files. Even the .gitignore. It also has to have the same permissions (though not the same owner).

Once you're done you can delete the .git directory from the mystery repository, but more likely you should replace it with a clone and checkout the correct version.


If that doesn't work because the directories aren't exactly the same, you can use a similar technique to zero in on a range of commits by using the checksum of a single, important, often edited file (which git calls a "blob"). On your mystery repository, do

git hash-object <important file>

Then you can search your project history for that hash. That's covered in this answer.

Community
  • 1
  • 1
Schwern
  • 153,029
  • 25
  • 195
  • 336