2

I need to loop over every folder in a directory and find the user responsible for the first and last commit. Is there any smart way to do this in git bash? I tried looking into this with the subprocess module in Python, and using that to loop through the folders, but not sure that is a good approach

What I have tried is

  • git log -- path/to/folder: This solution just lists all commits to that subfolder. But I wish to filter only the first and last commit. I also wish to loop through all folders in the directory
  • The replies in this stackoverflow thread link: They didn't seem to work for me (either printing nothing, or giving an error)
oskros
  • 3,101
  • 2
  • 9
  • 28
  • Did you have a look at GitPython module? https://gitpython.readthedocs.io/en/stable/ – ypnos Nov 23 '20 at 12:44
  • Also you might have a look at this question: https://stackoverflow.com/questions/10073154/git-log-follow-the-gitpython-way – ypnos Nov 23 '20 at 12:53
  • At first glance it seems they focus on commit history for files, and not directories so it doesn't fully solve it for me. But I will have a more thorough look, thanks! – oskros Nov 23 '20 at 12:56
  • I thought it would work the same for directories, but I guess that was a misconception on my side. – ypnos Nov 23 '20 at 13:22
  • Can you describe in more details what doesn't work with `git log -- path/to/folder` ? – LeGEC Nov 23 '20 at 14:13
  • Yes of course, edited my question – oskros Nov 23 '20 at 14:22

2 Answers2

3

Assuming you are interested in the current branch only, you can get the first commit via Git Bash with

git rev-list HEAD -- path/to/folder | tail -1

and the last commit with

git rev-list HEAD -- path/to/folder | head -1

git rev-list is similar to git log, but it is a "plumbing" command. "Plumbing" commands are a bit less user-friendly than "porcelain" commands like git log, but they are guaranteed to behave consistently regardless of your personal settings whereas "porcelain" commands may have different output depending on your config. Because of this, it's usually a good idea to use "plumbing" commands when writing scripts/programs.

git rev-list returns only the commit hash by default, but you can use --pretty/--format options similar to git log.

head and tail take a longer input—in this case, the entire list of commits for a path—and return only the first/last n lines, where n is whatever number you give as the parameter. git log and git rev-list show the most recent commit first, so you need tail to get the first commit and head to get the last.

You could also get the last commit using

git rev-list HEAD -1 -- path/to/folder

without piping to head. However, you cannot get the first commit using Git's built-in commit-limiting options, because e.g.

git rev-list HEAD --reverse -1 -- path/to/folder

applies the -1 limiter first, returning only the last commit, before applying --reverse.

Finally, it's worth noting that Git doesn't truly track directories, only files. If you create a folder with no files in it, it's not possible to commit that folder, and if you delete all the files within a folder, then as far as Git is concerned that folder doesn't exist anymore. The upshot is: these commands will get you the first and last commits that touch any file within the directory (and its subdirectories) as opposed to the directory itself. This distinction may or may not be important for your scenario.

Daniel Smith
  • 438
  • 4
  • 7
0

I solved my issue with subprocess in the end

import subprocess
import os

dir_path = os.path.normpath('C:/folder_path')
for f in os.listdir(dir_path):
    subpath = os.path.join(dir_path, f)
    subprocess_args = ['git', 'log', "--pretty=format:{'author': '%aN', 'date': '%as', 'email': '%ce'}", subpath]
    commits = subprocess.check_output(subprocess_args).decode().split('\n')
    print(f'{f} -- first: {commits[-1]}, last: {commits[0]}')
oskros
  • 3,101
  • 2
  • 9
  • 28