I'm writing a git pre-commit hook in Python, and I'd like to define a blacklist like a .gitignore
file to check files against before processing them. Is there an easy way to check whether a file is defined against a set of .gitignore
rules? The rules are kind of arcane, and I'd rather not have to reimplement them.

- 85,731
- 25
- 98
- 139
-
See this maybe: http://stackoverflow.com/a/25230908/154762 – solarc Dec 13 '16 at 21:28
-
Yes, I can implement the logic using the `fnmatch` function. Unfortunately, the `fnmatch` function in Python doesn't support the FNM_PATHNAME flag, and there aren't any functions which support the `**` used by git that I can find. – Chris B. Dec 13 '16 at 21:32
-
Also, why is this question getting downvoted? Is there a better place to ask this question? – Chris B. Dec 13 '16 at 21:33
-
I didn't downvote myself but as I said, it "smells XYish", I suspect that's the main source of downvotes. – torek Dec 13 '16 at 21:35
-
I want to create a blacklist of files in git that don't get linted when they get committed, just like `.gitignore` stores a blacklist of files that don't get stored in git. They seem like very, very similar problems, and that suggests very, very similar solutions to me. Is there a better approach? – Chris B. Dec 13 '16 at 21:42
-
if you're feeling slightly adventurous, you could look at `dir.c` as part of the git source code. It seems to contain a couple of functions that, when combined, should do what you want. Can't promise it's easier than writing a parser yourself, but wanted to have mentioned it anyway. At the very least, you'll learn something about how git works. – Thijs van Dien Dec 13 '16 at 23:56
-
I don't understand the reasoning of this being an XY problem and why this question was downvoted. The descriptive explanation in the question may be a bit specific, but it explains the context. The title of the question illustrates the actual problem and there is no suggestion of the question being biased towards a specific solution. – Eddy Jul 04 '18 at 10:06
2 Answers
Assuming you're in the directory containing the .gitignore file, one shell command will list all the files that are not ignored:
git ls-files
From python you can simply call:
import os
os.system("git ls-files")
and you can extract the list of files like so:
import subprocess
list_of_files = subprocess.check_output("git ls-files", shell=True).splitlines()
If you want to list the the files that are ignored (a.k.a, untracked), then you add the option '--other':
git ls-files --other

- 6,661
- 21
- 58
- 71
This is rather klunky, but should work:
- create a temporary git repository
- populate it with your proposed
.gitignore
- also populate it with one file per pathname
- use
git status --porcelain
on the resulting temporary repository - empty it out (remove it entirely, or preserve it as empty for the next pass, whichever seems more appropriate).
This does, however, smell like an XY problem. The klunky solution to Y is probably a poor solution to the real problem X.
Post-comment answer with details (and side notes)
So, you have some set of files to lint, probably from inspecting the commit. The following code may be more generic than you need (we don't really need the status
part in most cases) but I include it for illustration:
import subprocess
proc = subprocess.Popen(['git',
'diff-index', # use plumbing command, not user diff
'--cached', # compare index vs HEAD
'-r', # recurse into subdirectories
'--name-status', # show status & pathname
# '--diff-filter=AM', # optional: only A and M files
'-z', # use machine-readable output
'HEAD'], # the commit to compare against
stdout=subprocess.PIPE)
text = proc.stdout.read()
status = proc.wait()
# and check for failure as usual: Git returns 0 on success
Now we need something like pairwise
from Iterating over every two elements in a list:
import sys
if sys.version_info[0] >= 3:
izip = zip
else:
from itertools import izip
def pairwise(it):
"s -> (s0, s1), (s2, s3), (s4, s5), ..."
a = iter(it)
return izip(a, a)
and we can break up the git status
output with:
for state, path in pairwise(text.split(b'\0')):
...
We now have a state (b'A'
= added, b'M'
= modified, and so on) for each file. (Be sure to check for state T
if you allow symlinks, in case a file changes from ordinary file to symlink, or vice versa. Note that we're depending on pairwise
to discard the unpaired empty b''
string at the end of text.split(b'\0')
, which is there because Git produces a NUL-terminated list rather than a NUL-separated list.)
Let's assume that at some point we collect up the files-to-maybe-lint into a list (or iterable) called candidates
:
>>> candidates
[b'a.py', b'dir/b.py', b'z.py']
I will assume that you have avoided putting .gitignore
into this list-or-iterable, since we plan to take it over for our own purposes.
Now we have two big problems: ignoring some files, and getting the version of those files that will actually be linted.
Just because a file is listed as modified, doesn't mean that the version in the work-tree is the version that will be committed. For instance:
$ git status
$ echo foo >> README
$ git add README
$ echo bar >> README
$ git status --short
MM README
The first M
here means that the index version differs from HEAD
(this is what we got from git diff-index
above) while the second M
here means that the index version also differs from the work-tree version.
The version that will be committed is the index version, not the work-tree version. What we need to lint is not the work-tree version but rather the index version.
So, now we need a temporary directory. The thing to use here is tempfile.mkdtemp
if your Python is old, or the fancified context manager version if not. Note that we have byte-string pathnames above when working with Python3, and ordinary (string) pathnames when working with Python2, so this also is version dependent.
Since this is ordinary Python, not tricky Git interaction, I leave this part as an exercise—and I'll just gloss right over all the bytes-vs-strings pathname stuff. :-) However, for the --stdin -z
bit below, note that Git will need the list of file names as b\0
-separated bytes.
Once we have the (empty) temporary directory, in a format suitable for passing to cwd=
in subprocess.Popen
, we now need to run git checkout-index
. There are a few options but let's go this way:
import os
proc = subprocess.Popen(['git', 'rev-parse', '--git-dir'],
stdout=subprocess.PIPE)
git_dir = proc.stdout.read().rstrip(b'\n')
status = proc.wait()
if status:
raise ...
if sys.version_info[0] >= 3: # XXX ugh, but don't want to getcwdb etc
git_dir = git_dir.decode('utf8')
git_dir = os.path.join(os.getcwd(), git_dir)
proc = subprocess.Popen(['git',
'--git-dir={}'.format(git_dir),
'checkout-index', '-z', '--stdin'],
stdin=subprocess.PIPE, cwd=tmpdir)
proc.stdin.write(b'\0'.join(candidates))
proc.stdin.close()
status = proc.wait()
if status:
raise ...
Now we want to write our special ignore file to os.path.join(tmpdir, '.gitignore')
. Of course we also need tmpdir
to act like its own Git repository now. These three things will do the trick:
import shutil
subprocess.check_call(['git', 'init'], cwd=tmpdir)
shutil.copy(os.path.join(git_dir, '.pylintignore'),
os.path.join(tmpdir, '.gitignore'))
subprocess.check_call(['git', 'add', '-A'], cwd=tmpdir)
as we will now be using Git's ignore rules with the .pylintignore
file we copied to .gitignore
.
Now we just would need one more git status
pass (with -z
for b'\0' style output, like
git diff-index`) to deal with ignored files; but there's a simpler method. We can get Git to remove all the non-ignored files:
subprocess.check_call(['git', 'clean', '-fqx'], cwd=tmpdir)
shutil.rmtree(os.path.join(tmpdir, '.git'))
os.remove(os.path.join(tmpdir, '.gitignore')
and now everything in tmpdir
is precisely what we should lint.
Caveat: if your python linter needs to see imported code, you won't want to remove files. Instead, you'll want to use git status
or git diff-index
to compute the ignored files. Then you'll want to repeat the git checkout-index
, but with the -a
option, to extract all files into the temporary directory.
Once done, just remove the temp directory as usual (always clean up after yourself!).
Note that some parts of the above are tested piecewise, but assembling it all into full working Python2 or Python3 code remains an exercise.
-
Well, my problem is I would like to create a blacklist of files that shouldn't be linted, just like `.gitignore` allows you to define a blacklist of files that shouldn't be stored in the repo. The obvious solution seemed to me to use the `.gitignore` format to track them. – Chris B. Dec 13 '16 at 21:37
-
Ah, so, we want files that aren't *already* ignored. That actually helps, because now we can use `core.excludesFile`. – torek Dec 13 '16 at 21:45
-
... or maybe not, because such files are already in the index, hence will not be ignored. Back to the klunky method :-) – torek Dec 13 '16 at 22:09