0

I have a folder that need to contain certain files that contains magic in their name so i have a list of all the files with os.listdir(sstable_dir_path) and i have a list of regexes that one of them supposed to match one of those filenames. is there any way to do so without a nested for?

SSTABLE_FILENAMES_REGEXES = [re.compile(r'md-\d+-big-CompressionInfo.db'), re.compile(r'md-\d+-big-Data.db'),
                             re.compile(r'md-\d+-big-Digest.crc32'), re.compile(r'md-\d+-big-Filter.db'),
                             re.compile(r'md-\d+-big-Index.db'), re.compile(r'md-\d+-big-Statistics.db'),
                             re.compile(r'md-\d+-big-Summary.db'), re.compile(r'md-\d+-big-TOC.txt')]

filenames example:

md-146-big-CompressionInfo.db
md-146-big-Data.db
md-146-big-Digest.crc32
md-146-big-Filter.db
md-146-big-Index.db
md-146-big-Statistics.db
md-146-big-Summary.db
md-146-big-TOC.txt

how i currently do it

all([any([regex.fullmatch(fillename) for regex in SSTABLE_FILENAMES_REGEXES]) for fillename in os.listdir(sstable_dir_path)])

Ema Il
  • 405
  • 1
  • 5
  • 14
  • Hey you can use the operator OR in regex. – Beny Gj Apr 07 '21 at 14:51
  • As @BenyGj noted, something like [this](https://stackoverflow.com/questions/8609597/python-regular-expressions-or) then just use `re.compile` once on your single pattern with lots of `|` characters in it. – Frodnar Apr 07 '21 at 14:53
  • @KarlThornton what do you mean two sets of [] ? using | does help thanks! – Ema Il Apr 08 '21 at 07:14

2 Answers2

1

If you wanted to you could build a single regex in the format (?=.*^pattern1$)(?=.*^pattern2$) - the (?=) is a positive lookahead - the ^$ are used to emulate the "fullmatch" behaviour.

You can then create a multilined string from os.listdir() to match against.

SSTABLE_FILENAMES = [
    'big-CompressionInfo.db', 'big-Data.db', 'big-Digest.crc32', 'big-Filter.db',
    'big-Index.db', 'big-Statistics.db', 'big-Summary.db', 'big-TOC.txt'
]

regex = re.compile('(?ms)' + 
    ''.join(f'(?=.*^md-\d+-{re.escape(name)}$)' 
    for name in SSTABLE_FILENAMES)
)

>>> bool(regex.search('\n'.join(os.listdir(sstable_dir_path))))
True
0
files = ['md-146-big-CompressionInfo.db', 
         'md-146-big-Data.db', 
         'md-146-big-Digest.crc32', 
         'md-146-big-Filter.db', 
         'md-146-big-Index.db', 
         'md-146-big-Statistics.db', 
         'md-146-big-Summary.db', 
         'md-146-big-TOC.txt']
pattern = '|'.join(map(lambda x: x.pattern, SSTABLE_FILENAMES_REGEXES))
res = [fillename for fillename in files.split() if re.fullmatch(pattern=pattern , string=fillename) ]

print(res)

result:

['md-146-big-CompressionInfo.db', 'md-146-big-Data.db', 'md-146-big-Digest.crc32', 'md-146-big-Filter.db', 'md-146-big-Index.db', 'md-146-big-Statistics.db', 'md-146-big-Summary.db', 'md-146-big-TOC.txt']
Beny Gj
  • 607
  • 4
  • 16