11

What would be the simplest way to have .gitignore style fnmatch() with Python. Looks like that stdlib does not provide a match() function which would match a path spec against an UNIX style path regex.

.gitignore have both paths and files with wildcards to be (black)listed

fish2000
  • 4,289
  • 2
  • 37
  • 76
Mikko Ohtamaa
  • 82,057
  • 50
  • 264
  • 435
  • Why do regular expression not work for you? – jdi Apr 06 '12 at 20:19
  • I prefer to accept only valid answers. – Mikko Ohtamaa Apr 06 '12 at 21:27
  • Maybe I ask too hard questions? :) – Mikko Ohtamaa Apr 06 '12 at 22:51
  • I guess you could look at it that way, or just having unrealistic expectations. Some of your questions get good activity, some don't. And you have also had some down-voted or closed; A mixed bag really. It just really does motivate people to interact with your questions in this community when they know you are the type of person that can be helped. Whats the point of trying to offer answers if nothing will please you? – jdi Apr 06 '12 at 22:56
  • Found an example GPL2 implementation here: http://bazaar.launchpad.net/~bzr-pqm/bzr/bzr.dev/view/head:/bzrlib/globbing.py – Mikko Ohtamaa Apr 07 '12 at 00:12
  • 1
    I am not looking for amusement or karma, I am looking for answers. Correct answers will be accepted and I hope people will be up to that before even trying, so that I don't need to shamelessly downvote bad/incorrect answers. Good answers = good karma :) – Mikko Ohtamaa Apr 07 '12 at 00:18
  • @MikkoOhtamaa Consider changing the accepted answer.. – wim Sep 14 '18 at 03:03

2 Answers2

24

There's a library called pathspec which implements the full .gitignore specification, including things like **/*.py; the documentation describes how to handle Git pattern matching (you can also see code).

>>> import pathspec
>>> spec_src = '**/*.pyc'
>>> spec = pathspec.PathSpec.from_lines(pathspec.patterns.GitWildMatchPattern, spec_src.splitlines())
>>> set(spec.match_files({"test.py", "test.pyc", "deeper/file.pyc", "even/deeper/file.pyc"}))
set(['test.pyc', 'even/deeper/file.pyc', 'deeper/file.pyc'])
>>> set(spec.match_tree("pathspec/"))
set(['__init__.pyc', 'gitignore.pyc', 'util.pyc', 'pattern.pyc', 'tests/__init__.pyc', 'tests/test_gitignore.pyc', 'compat.pyc', 'pathspec.pyc'])
Nuno André
  • 4,739
  • 1
  • 33
  • 46
David Fraser
  • 6,475
  • 1
  • 40
  • 56
9

If you want to use mixed UNIX wildcard patterns as listed in your .gitignore example, why not just take each pattern and use fnmatch.translate with re.search?

import fnmatch
import re

s = '/path/eggs/foo/bar'
pattern = "eggs/*"

re.search(fnmatch.translate(pattern), s)
# <_sre.SRE_Match object at 0x10049e988>

translate turns the wildcard pattern into a re pattern

Hidden UNIX files:

s = '/path/to/hidden/.file'
isHiddenFile = re.search(fnmatch.translate('.*'), s)
if not isHiddenFile:
    # do something with it
jdi
  • 90,542
  • 19
  • 167
  • 203
  • Unfortunately this fails with such a simple fnmatch pattern like .* (ignore all UNIX hidden files). – Mikko Ohtamaa Apr 07 '12 at 00:12
  • @MikkoOhtamaa: I'm not sure I follow. My update shows that it properly matches a path to a hidden unix file. – jdi Apr 07 '12 at 00:21
  • @MikkoOhtamaa: Yea I give up. I don't get the correlations. Good luck! – jdi Apr 07 '12 at 01:25
  • 1
    But you were on very right track - bug is more like deeper issue with Python fnmatch() itself. I'll mark this closed and leave this link for the future generationsto https://github.com/miohtama/vvv/tree/master/vvv/bzrlib as the solution. – Mikko Ohtamaa Apr 07 '12 at 10:12