2

I want to find the last commit that touched each file that ever existed on a set of branches. That is, for each file that ever existed on one or more of the specified branches, give me the last commit that touched it.

That commit could have added the file, modified the file, deleted the file, etc. I need the commit hash but it would be nice to generate the file status from the commit (A,M,D,etc.), the set of branches that reach it and the commit date with the same command so I don't have to go run more commands to generate it. I doubt I can get all of that in one go but that's the ultimate set of information I need.

I know how to get a list of files ever in the repository but not how to reduce that to the set of files that ever existed on a set of branches. Even if I generated a file list it seems inefficient to generate that and then go back and do a git log for each file. Is there a way to do it in one go and at least get the most recent commit hash for each such file.

I have tried this basic algorithm:

  1. Gather all files via git log --all --diff-filter=A --pretty=format: --name-only --date-order
  2. For each file, run git log -n1 --date-order --all --pretty=format:%H -- file

Step 1 takes a while (perhaps 30 seconds) but I can live with that since it's only done once.

Step 2 takes 3-4 seconds for each invocation of git log, which is much too slow when dealing with thousands of files.

I'm looking for some way to do this more efficiently, probably via plumbing.

Alternatively, if there's a way to speed up git log that could be a solution as well.

David Greene
  • 394
  • 2
  • 12

2 Answers2

0

The "hard" way to do this is to actually walk the repo by looking at the log file. That will get messy really fast and you'll probably have a hard time getting what you want.

The "easy" way is to actually look at the git objects. here is a starting point. the idea is that you can basically build this information by looking at the object git uses.

The "hard" way is actually easier but messier. The "easy" way is harder but you probably have better odds of getting it right.

Hope this helps.

Mircea
  • 10,216
  • 2
  • 30
  • 46
0

That's quite a bunch of requirements...I'd try to get the wanted output first before wondering about efficiency.

Here are some pointers which can be put together to create a script:

  1. This answer to generate a list of tracked files (git ls-tree or git log)
  2. This answer to get the most recent commit for each (git log)
  3. Some variation of git status to get the status for each
  4. This answer to get the branches that contain a certain commit (git branch)
  5. Standard command line utility to display all that jazz nicely

Hope that helps to reach your goal.

Community
  • 1
  • 1
Jens Hoffmann
  • 6,699
  • 2
  • 25
  • 31
  • It's basically what I've already done but it's painfully slow. The git log bit dominates by far. Hence the efficiency issue. – David Greene Jul 02 '15 at 22:44
  • Ah I see, maybe mention in your Q that you tried this already (SO is very keen on people showing that they tried before asking) and reformulate your Q to specifically ask about efficiency – Jens Hoffmann Jul 02 '15 at 22:49
  • Is it also slow on 1 branch only? If not then you could use these steps for 1 branch and overall run parallel jobs for the branches you want? – Jens Hoffmann Jul 02 '15 at 22:52
  • I've edited the question to indicate a tried a basic algorithm. Thanks for the pointer! It is slow on just one branch as well. If I don't use -- filename it's nearly instantaneous but of course that doesn't do what I want. – David Greene Jul 03 '15 at 02:35