9

I have a Git repository with several thousand files, and would like to get the date and time of the last commit for each individual file. Can this be done using Python (e.g., by using something like os.path.getmtime(path))?

Rintze Zelle
  • 1,654
  • 1
  • 14
  • 30

4 Answers4

13

With GitPython, this would do the job:

import git
repo = git.Repo("./repo")
tree = repo.tree()
for blob in tree:
    commit = next(repo.iter_commits(paths=blob.path, max_count=1))
    print(blob.path, commit.committed_date)

Note that commit.committed_date is in "seconds since epoch" format.

a3nm
  • 8,717
  • 6
  • 31
  • 39
Marian
  • 14,759
  • 6
  • 32
  • 44
4

An interesting question. Below is a quick and dirty implementation. I've used multiprocessing.Pool.imap() to start subprocesses because it's convenient.

#!/usr/bin/env python
# vim:fileencoding=utf-8:ft=python
#
# Author: R.F. Smith <rsmith@xs4all.nl>
# Last modified: 2015-05-24 12:28:45 +0200
#
# To the extent possible under law, Roland Smith has waived all
# copyright and related or neighboring rights to gitdates.py. This
# work is published from the Netherlands. See
# http://creativecommons.org/publicdomain/zero/1.0/

"""For each file in a directory managed by git, get the short hash and
data of the most recent commit of that file."""

from __future__ import print_function
from multiprocessing import Pool
import os
import subprocess
import sys
import time

# Suppres annoying command prompts on ms-windows.
startupinfo = None
if os.name == 'nt':
    startupinfo = subprocess.STARTUPINFO()
    startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW


def main():
    """
    Entry point for gitdates.
    """
    checkfor(['git', '--version'])
    # Get a list of all files
    allfiles = []
    # Get a list of excluded files.
    if '.git' not in os.listdir('.'):
        print('This directory is not managed by git.')
        sys.exit(0)
    exargs = ['git', 'ls-files', '-i', '-o', '--exclude-standard']
    exc = subprocess.check_output(exargs, startupinfo=startupinfo).split()
    for root, dirs, files in os.walk('.'):
        for d in ['.git', '__pycache__']:
            try:
                dirs.remove(d)
            except ValueError:
                pass
        tmp = [os.path.join(root, f) for f in files if f not in exc]
        allfiles += tmp
    # Gather the files' data using a Pool.
    p = Pool()
    filedata = [res for res in p.imap_unordered(filecheck, allfiles)
                if res is not None]
    p.close()
    # Sort the data (latest modified first) and print it
    filedata.sort(key=lambda a: a[2], reverse=True)
    dfmt = '%Y-%m-%d %H:%M:%S %Z'
    for name, tag, date in filedata:
        print('{}|{}|{}'.format(name, tag, time.strftime(dfmt, date)))


def checkfor(args, rv=0):
    """
    Make sure that a program necessary for using this script is available.
    Calls sys.exit when this is not the case.

    Arguments:
        args: String or list of strings of commands. A single string may
            not contain spaces.
        rv: Expected return value from evoking the command.
    """
    if isinstance(args, str):
        if ' ' in args:
            raise ValueError('no spaces in single command allowed')
        args = [args]
    try:
        with open(os.devnull, 'w') as bb:
            rc = subprocess.call(args, stdout=bb, stderr=bb,
                                 startupinfo=startupinfo)
        if rc != rv:
            raise OSError
    except OSError as oops:
        outs = "Required program '{}' not found: {}."
        print(outs.format(args[0], oops.strerror))
        sys.exit(1)


def filecheck(fname):
    """
    Start a git process to get file info. Return a string containing the
    filename, the abbreviated commit hash and the author date in ISO 8601
    format.

    Arguments:
        fname: Name of the file to check.

    Returns:
        A 3-tuple containing the file name, latest short hash and latest
        commit date.
    """
    args = ['git', '--no-pager', 'log', '-1', '--format=%h|%at', fname]
    try:
        b = subprocess.check_output(args, startupinfo=startupinfo)
        data = b.decode()[:-1]
        h, t = data.split('|')
        out = (fname[2:], h, time.gmtime(float(t)))
    except (subprocess.CalledProcessError, ValueError):
        return None
    return out


if __name__ == '__main__':
    main()

Example output:

serve-git|8d92934|2012-08-31 21:21:38 +0200
setres|8d92934|2012-08-31 21:21:38 +0200
mydec|e711e27|2008-04-09 21:26:05 +0200
sync-iaudio|8d92934|2012-08-31 21:21:38 +0200
tarenc|8d92934|2012-08-31 21:21:38 +0200
keypress.sh|a5c0fb5|2009-09-29 00:00:51 +0200
tolower|8d92934|2012-08-31 21:21:38 +0200

Edit: Updated to use the os.devnull (that works on ms-windows as well) instead of /dev/null.

Edit2: Used startupinfo to suppress command prompts popping up on ms-windows.

Edit3: Used __future__ to make this compatible with both Python 2 and 3. Tested with 2.7.9 and 3.4.3. Now also available on github.

Rintze Zelle
  • 1,654
  • 1
  • 14
  • 30
Roland Smith
  • 42,427
  • 3
  • 64
  • 94
  • Thanks. I probably should have mentioned that I'm trying to do this under Windwos, though, so the `/dev/null` bit won't work. – Rintze Zelle Oct 28 '12 at 14:32
  • @RintzeZelle Answer updated to use the more portable `os.devnull`. – Roland Smith Oct 28 '12 at 16:04
  • Thanks! I just had to add the path to the git executable to the Windows PATH system variable. Command prompts are opened (and closed) for each file though, although it might be possible to suppress that: http://stackoverflow.com/questions/1016384/cross-platform-subprocess-with-hidden-window – Rintze Zelle Oct 29 '12 at 01:49
0

You can use the GitPython library.

Marian
  • 14,759
  • 6
  • 32
  • 44
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • 2
    Thanks. I found that library already, but I couldn't make much sense out of its documentation (I haven't really studied up on the inner workings of Git). – Rintze Zelle Oct 28 '12 at 14:04
0

This works for me

http://gitpython.readthedocs.io/en/stable/tutorial.html#the-tree-object

As per the doc As trees allow direct access to their intermediate child entries only, use the traverse method to obtain an iterator to retrieve entries recursively

It creates a generator object which does the work

print tree.traverse()
<generator object traverse at 0x0000000004129DC8>

for blob in tree.traverse():
    commit=repo.iter_commits(paths=blob.path).next()
        print(blob.path,commit.committed_date)
user3399495
  • 167
  • 1
  • 12