3

For a given file in a git repo, I'd like to look up the SHA of the last commit in which the file was modified, along with the timestamp.

At the command line, this data is visible with git log for a particular file path, e.g.

git log -n 1 path/to/file

Using the "git" gem for ruby I can also do this:

require 'git'
g = Git.open("/path/to/repo")
modified = g.log(1).object(relative/path/to/file).first.date
sha = g.log(1).object(relative/path/to/file).first.sha

Which is great, but is running too slowly for me when looping through lots of paths. As Rugged uses C libraries instead, I was hoping it would be faster but cannot see how to construct the right query in the rugged syntax. Any suggestions?

Carlos Martín Nieto
  • 5,207
  • 1
  • 15
  • 16
cboettig
  • 12,377
  • 13
  • 70
  • 113

1 Answers1

11

This should work:

repo = Rugged::Repository.new("/path/to/repo")
walker = Rugged::Walker.new(repo)
walker.sorting(Rugged::SORT_DATE)
walker.push(repo.head.target)
commit = walker.find do |commit|
  commit.parents.size == 1 && commit.diff(paths: ["relative/path/to/file"]).size > 0
end
sha = commit.oid

Taken and adapted from https://github.com/libgit2/pygit2/issues/200#issuecomment-15899713

As an aside: Just because rugged is written in C does not mean that costly operations suddenly become cheap and quick. Obviously, you save a lot of string parsing and stuff like that, but this is not always the bottleneck.

As you're not interested in the actual textual diff here, the libgit2 GIT_DIFF_FORCE_BINARY might be something that could also help in increasing the performance of this lookup - unfortunately this is not yet available in Rugged (but will be, soon).

Testing this with the Rugged repo itself, it works correctly:

repo = Rugged::Repository.new(".")
walker = Rugged::Walker.new(repo)
walker.sorting(Rugged::SORT_DATE)
walker.push(repo.head.target)
commit = walker.find do |commit|
  commit.parents.size == 1 && commit.diff(paths: ["Gemfile"]).size > 0
end
sha = commit.oid # => "8f5c763377f5bf0fb88d196b7c45a7d715264ad4"

walker = Rugged::Walker.new(repo)
walker.sorting(Rugged::SORT_DATE)
walker.push(repo.head.target)
commit = walker.find do |commit|
  commit.parents.size == 1 && commit.diff(paths: [".travis.yml"]).size > 0
end
sha = commit.oid # => "4e18e05944daa2ba8d63a2c6b149900e3b93a88f"
  • Thanks! Um, not sure what I missed but does not seem to be working for me -- `commit` comes out as `nil`. Also it seems strange to me that the syntax requires that loop since neither of the methods in my question require a (explicit) loop, but query the log directly instead... – cboettig Jan 23 '14 at 21:19
  • I just updated my answer with some tiny fixes (it now uses the correct sorting and it ignores merge commits). – Arthur Schreiber Jan 24 '14 at 08:45
  • I also added some examples now. Make sure that the path that you give to `Rugged::Commit#diff` does actually exist in your repo, that might explain your `nil` value of commit. Also, you might want to pass `disable_pathspec_match: true` if you only pass fixed paths and not pathspecs to the `#diff` method. – Arthur Schreiber Jan 24 '14 at 08:51
  • @ArthurSchreiber how would `GIT_DIFF_FORCE_BINARY` be specified in your code example? – Mike Slinn Mar 07 '23 at 22:48