git pre-commit hook: trigger only to actual to be commited code?

Question

I wanted to do pre-commit hook that would check if Python code follow pep8, so I did this like in https://www.stavros.io/posts/more-pep8-git-hooks/

I created .git/hooks/pre-commit file. And added this content:

#!/bin/sh
flake8 .

Then: chmod +x .git/hooks/pre-commit

But when I enter git commit, it actually checks the whole branch and if it finds anything in branch that does not follow pep8, it will terminate (that repository is a bit old and some code did not follow pep8 from the start, so I know it could be refactored, but I don't need pre-hook to tell me that about already committed code).

How can I make it only check the code for the current commit that is supposed to be committed?

Doesn't quite answer this question, but this is handled in http://pre-commit.com. This project uses a combination of `git diff --staged --name-only` and `git checkout` + `git apply` (essentially a stash). — anthony sottile, Sep 20 '16 at 13:52

larsks · Answer 1 · 2016-04-22T13:24:48.240

3

How can I make it only check the code for the current commit that is supposed to be committed?

Use the git checkout-index to checkout the files you are committing into a temporary directory, and then run flake8 on that temporary directory.

First, creating a temporary directory:

tmpdir=$(mktemp -d commitXXXXXX)
trap "rm -rf $tmpdir" EXIT

Then checkout the files to be committed into that directory:

git checkout-index --prefix=$tmpdir/ -af

Get a list of files modified in this commit and run flake8 against them:

git diff --cached --name-only --diff-filter=ACM | grep '\.py$' |
(cd $tmpdir; xargs --no-run-if-empty flake8)

edited Apr 22 '16 at 13:24

answered Apr 22 '16 at 12:02

larsks

277,717
41
399
399

Hm, I should probably modify the script I just wrote to use `git checkout-index` rather than plain `git checkout` (won't need the `--work-tree` argument then). – torek Apr 22 '16 at 12:30
Am I doing something wrong? I added this code instead of the one I've written, but still it checks more than my current commit. Steps I did. Replaced pre-commit hook with the code you've written, Changed one file a bit to be against pep8, then added the file and then have written `git commit`. Then got whole bunch of warnings about other modules being incorrect. – Andrius Apr 22 '16 at 12:48
That is because I was in a hurry earlier and skipped a step. All fixed. – larsks Apr 22 '16 at 13:24
@larsks: For situations where a git repo has literally 20k files, but commits consist of 1-5 files (with some standard deviation, but you know what I mean), do you still think that a complete `git checkout-index` is best? Or would just running a `git show :{}` (on each file) within the xargs be better practice? – Mort Apr 27 '16 at 18:06
That might be better. The repositories I work with don't tend to be that large, and the tools I'm running in pre-commit prefer to deal with files on disk rather than stdin. Either should work. – larsks Apr 27 '16 at 18:24

score 2 · Answer 2 · edited May 23 '17 at 10:34

First, let's answer the question you actually asked, because you need to know this eventually anyway:

How can I make it only check the code for the current commit that is supposed to be committed?

The contents of the commit-that-will-be-made is whatever is in the index.

This is every file, not just files you recently git added. For instance, if your work-tree has with eight source-controlled *.py Python files, and you modified and git add-ed two of them, there are eight *.py files that will be committed this time. Every commit has every file, and not a set of changes from some previous file(s).¹

In order to check those eight files (and not the current contents of the work-tree), you need to extract the contents of the index somewhere.

Of course, this is not what you actually want:

it actually checks the whole branch and if it finds anything in branch that does not follow pep8, it will terminate (that repository is a bit old and some code did not follow pep8 from the start, so I know it could be refactored, but I don't need pre-hook to tell me that about already committed code)

Again, it does not actually check the branch; it checks the work tree. While this does not immediately help you get where you need to go, it is important.

What we need here is to extract, perhaps from the index, the files that are different from the current commit.

The way to do this is to make a new commit. Ideally, we might make this commit somewhere other than on the current branch, because if we make it on the branch, we have to un-make it again later.

There is a command that does this—that makes commits from the index, but not on a branch—and that command is git stash. Unfortunately, there is a bug in git stash when you go to use it this way. Rather than using the work-arounds I describe in that answer, we can do something different: we can make our own tree in a temporary directory, after comparing the current index to the HEAD commit.

The script for this (actually tested, even!) is below.

¹You might object here that git log -p or git show shows you the changes. You are correct that it shows you changes, but it does so by running git diff against a previous commit, which also contains every file. It is by comparing the previous version of "everything" to the next version of "everything" that git discovers what changed.

#! /bin/sh

# run-checks: run some checking command(s) on a proposed commit.
#
# Optionally, run it only on files that differ from those in
# the current commit (added or modified, treating rename as
# modify), and/or do not run it at all if there are
# no such files (e.g., if the commit consists only of file
# removals).

usage()
{
    echo "usage: $0 [-d] checkcmd [args ...]" 1>&2
    exit 1
}

# probably should use git rev-parse feature now, oh well
diffmode=false
skipempty=true
while true; do
    case "$1" in
    -d|--diff) diffmode=true; shift;;
    -z|--run-even-if-empty) skipempty=false; shift;;
    -dz) diffmode=true; skipempty=false; shift;;
    *) break;;
    esac
done

case "$#" in
0) usage;;
esac

# from here on, exit on error
set -e

# get temporary directory and arrange to clean it up
tdir=$(mktemp -d -t run-checks)
trap "rm -rf $tdir" 0 1 2 3 15

# Get list of changed files (whether or not we are using
# only the changed files).  This includes deleted files.
# For efficiency, we treat renames as delete/add pairs here.
# Require that new commit not match current commit.
if test $(git diff --cached --name-only --no-renames HEAD | wc -l) -eq 0; then
    echo "no changes to test before committing"
    exit 1
fi

# Populate work tree in temp dir.  If we only want changed
# files, limit the checkout to files added or modified.  Note
# that this list might be empty.
if $diffmode; then
    git diff --cached --name-only --no-renames --diff-filter=AM -z HEAD |
        xargs -0 git --work-tree=$tdir checkout -f --
else
    git --work-tree=$tdir checkout -f -- .
fi

# Now run checker in temp work tree.  Our exit status is
# its exit status.  Do not use exec since we must still clean
# up the temp dir, and optionally skip checker if work tree is empty.
cd $tdir
if test $(ls -A | wc -l) -eq 0; then
    is_empty=true
else
    is_empty=false
fi
if $skipempty && $is_empty; then exit 0; fi

if $is_empty; then
    $@
else
    $@ *
fi

git pre-commit hook: trigger only to actual to be commited code?

2 Answers2