21

I have python 2.7 and am trying to issue:

glob('{faint,bright*}/{science,calib}/chip?/')

I obtain no matches, however from the shell echo {faint,bright*}/{science,calib}/chip? gives:

faint/science/chip1 faint/science/chip2 faint/calib/chip1 faint/calib/chip2 bright1/science/chip1 bright1/science/chip2 bright1w/science/chip1 bright1w/science/chip2 bright2/science/chip1 bright2/science/chip2 bright2w/science/chip1 bright2w/science/chip2 bright1/calib/chip1 bright1/calib/chip2 bright1w/calib/chip1 bright1w/calib/chip2 bright2/calib/chip1 bright2/calib/chip2 bright2w/calib/chip1 bright2w/calib/chip2

What is wrong with my expression?

bashaus
  • 1,614
  • 1
  • 17
  • 33
astabada
  • 1,029
  • 4
  • 13
  • 26
  • I don't think the glob module supports curly braces, see http://bugs.python.org/issue9584 – Andrew Clark Apr 10 '14 at 19:00
  • The [`fnmatch` module](https://docs.python.org/2/library/fnmatch.html) (used by `glob` to implement the filename matching) is not nearly as sophisticated as to support `{...}` brace expansion syntax. – Martijn Pieters Apr 10 '14 at 19:09

6 Answers6

11

Combining globbing with brace expansion.

pip install braceexpand

Sample:

from glob import glob
from braceexpand import braceexpand

def braced_glob(path):
    l = []
    for x in braceexpand(path):
        l.extend(glob(x))
            
    return l
>>> braced_glob('/usr/bin/{x,z}*k')  
['/usr/bin/xclock', '/usr/bin/zipcloak']
Orwellophile
  • 13,235
  • 3
  • 69
  • 45
  • could you please clarify how this applies to `glob` ? – duff18 Feb 17 '21 at 17:42
  • @duff18 if you would please read the OP's question, you would see that he first needs to resolve the brace expansions, then apply glob.glob to each of the resultant results. – Orwellophile Mar 01 '21 at 08:42
  • that is not what OP said, he wants to use `glob` directly. The fact that he has to use a two-steps approach is not clear from your answer. – duff18 Mar 01 '21 at 13:52
  • I think that has been adequately covered by the earlier answers, I was just filling in the missing intel on brace expansion. But I will add an example that ties it all in, **just for you**. – Orwellophile Mar 09 '21 at 17:50
  • 1
    the answers are ordered by "votes" by default, so there is no guarantee the readers will get to your answer after having read some other explain-it-all answer. hence your answer needs to be self-contained, as it is now. – duff18 Mar 10 '21 at 07:10
  • Just to add onto what @Orwellophile mentioned, and for the sake completeness, here's another package that does the same thing: bracex (https://pypi.org/project/bracex/) – shahensha Sep 13 '22 at 20:03
8

{..} is known as brace expansion, and is a separate step applied before globbing takes place.

It's not part of globs, and not supported by the python glob function.

that other guy
  • 116,971
  • 11
  • 170
  • 194
5

Since {} aren't supported by glob() in Python, what you probably want is something like

import os
import re

...

match_dir = re.compile('(faint|bright.*)/(science|calib)(/chip)?')
for dirpath, dirnames, filenames in os.walk("/your/top/dir")
    if match_dir.search(dirpath):
        do_whatever_with_files(dirpath, files)
        # OR
        do_whatever_with_subdirs(dirpath, dirnames)
DouglasDD
  • 395
  • 3
  • 11
3

As that other guy pointed out, Python doesn't support brace expansion directly. But since brace expansion is done before the wildcards are evaluated, you could do that yourself, e.g.,

result = glob('{faint,bright*}/{science,calib}/chip?/')

becomes

result = [
    f 
    for b in ['faint', 'bright*'] 
    for s in ['science', 'calib'] 
    for f in glob('{b}/{s}/chip?/'.format(b=b, s=s))
]
Matthias Fripp
  • 17,670
  • 5
  • 28
  • 45
3

As stated in other answers, brace-expansion is a pre-processing step for glob: you expand all the braces, then run glob on each of the results. (Brace-expansion turns one string into a list of strings.)

Orwellophile recommends the braceexpand library. This feels to me like too small of a problem to justify a dependency (though it's a common problem that ought to be in the standard library, ideally packaged in the glob module).

So here's a way to do it with a few lines of code.

import itertools
import re

def expand_braces(text, seen=None):
    if seen is None:
        seen = set()

    spans = [m.span() for m in re.finditer("\{[^\{\}]*\}", text)][::-1]
    alts = [text[start + 1 : stop - 1].split(",") for start, stop in spans]

    if len(spans) == 0:
        if text not in seen:
            yield text
        seen.add(text)

    else:
        for combo in itertools.product(*alts):
            replaced = list(text)
            for (start, stop), replacement in zip(spans, combo):
                replaced[start:stop] = replacement

            yield from expand_braces("".join(replaced), seen)

### testing

text_to_expand = "{{pine,}apples,oranges} are {tasty,disgusting} to m{}e }{"

for result in expand_braces(text_to_expand):
    print(result)

prints

pineapples are tasty to me }{
oranges are tasty to me }{
apples are tasty to me }{
pineapples are disgusting to me }{
oranges are disgusting to me }{
apples are disgusting to me }{

What's happening here is:

  1. Nested brackets can produce non-unique results, so we use seen to only yield results that haven't yet been seen.
  2. spans is the starting and stopping indexes of all innermost, balanced brackets in the text. The order is reversed by the [::-1] slice, such that indexes go from highest to lowest (will be relevant later).
  3. Each element of alts is the corresponding list of comma-delimited alternatives.
  4. If there aren't any matches (the text does not contain balanced brackets), yield the text itself, ensuring that it is unique with seen.
  5. Otherwise, use itertools.product to iterate over the Cartesian product of comma-delimited alternatives.
  6. Replace the curly-bracketed text with the current alternative. Since we're replacing data in-place, it has to be a mutable sequence (list, rather than str), and we have to replace the highest indexes first. If we replaced the lowest indexes first, the later indexes would have changed from what they were in the spans. This is why we reversed spans when it was first created.
  7. The text might have curly brackets within curly brackets. The regular expression only found balanced curly brackets that do not contain any other curly brackets, but nested curly brackets are legal. Therefore, we need to recurse until there are no nested curly brackets (the len(spans) == 0 case). Recursion with Python generators uses yield from to re-yield each result from the recursive call.

In the output, {{pine,}apples,oranges} is first expanded to {pineapples,oranges} and {apples,oranges}, and then each of these is expanded. The oranges result would appear twice if we didn't request unique results with seen.

Empty brackets like the ones in m{}e expand to nothing, so this is just me.

Unbalanced brackets, like }{, are left as-is.

This is not an algorithm to use if high performance for large datasets is required, but it's a general solution for reasonably sized data.

Jim Pivarski
  • 5,568
  • 2
  • 35
  • 47
0

The wcmatch library has an interface similar to Python's standard glob, with options to enable brace expansion, tilde expansion, and more. Enabling brace expansion, for example:

from wcmatch import glob

glob.glob('{faint,bright*}/{science,calib}/chip?/', flags=glob.BRACE)
Bluu
  • 5,226
  • 4
  • 29
  • 34