Using regex to find files with a particular pattern

Question

I am new to regex. I have read various tutorials, still I have failed to run my simple codes.

My files are organized such as "c1c2c4_aa_1", "c1c2c3_aa_2", "c1c2c8_aa_3", "c1c3c4_aa_4", ... "c1c2c4_bb_41", "c1c8c9_cc_58", "c1c3c11_aa_19"

I want to find all those ones that includes "aa" (such as "c1c2c3_aa_3") and convert them to "c1c2c4_zz_3"

So I want the last number and the first string before "_" remains fixed, but change the "aa" in the middle.

"c1", "c2", "c3" are some conditions. Also, the last numbers are quite random, so I do not know them to define them.

I am interested in using regex.

I tried this:

con_list1 = ["c1", "c2", ... "c8"]
con_list2 = ["c1", "c2", ... "c11"]
con_list3 = ["c1", "c2", ... "c10"]

for con1 in con_list1:
    for con2 in con_list2:
        for con3 in con_list3:
            if(os.path.exists("./" + con1 + con2 + con3 + "_aa(.*)")):
                os.rename("./" + con1 + con2 + con3 + "_aa(.*)", "./" + con1 + con2 + con3 + "_zz(.*)")

I want the last number corresponding to the file that I rename remains fixed:

"c1c2c3_aa_3" -> "c1c2c3_zz_3" "c1c2c3_aa_13" -> "c1c2c3_zz_13"

I am also interested in using regex and (.*) in the right way.

However, the above code seems not working.

I appreciate to help to implement this code.

Unrelated - but if you want a fun way to train your regex-fu: try https://regexcrossword.com/ — Patrick Artner, Nov 02 '22 at 08:25

David Lador · Answer 1 · 2022-11-04T09:37:46.483

If you have a list like con_list1 = ["c1c2c4_aa_1", "c1c2c3_aa_2", "c1c2c8_aa_3", "c1c3c4_aa_4"] you may try something like:

import re


con_list1 = ["c1c2c4_aa_1", "c1c2c3_aa_2", "c1c2c8_aa_3", "c1c3c4_aa_4"]

regex = "_aa_"
subst = "_zz_"

for test_str in con_list1:
    result = re.sub(regex, subst, test_str, 1)

but the most simple way is:

con_list1 = ["c1c2c4_aa_1", "c1c2c3_aa_2", "c1c2c8_aa_3", "c1c3c4_aa_4"]
for test_str in con_list1:
    test_str .replace('_aa_', '_zz_')

score 0 · Answer 2 · answered Nov 02 '22 at 08:25

0

Try this to find all names: "[a-z0-9]+_aa_[0-9]+"

names = re.findall(r'\"[a-z0-9]+\_aa\_[0-9]+\"', files_names_list.text, flags=re.I))

files_names_list is a list, where you have all your file names

Hope I understand you correctly

answered Nov 02 '22 at 08:25

Nero

109
13

Wiktor Stribiżew · Answer 3 · 2022-11-02T08:48:30.317

You can use

import os, re

con_list1 = ["c1", "c2", "c3","c4","c5","c6","c7","c8"]
con_list2 = ["c1", "c2", "c3","c4","c5","c6","c7","c8", "c9","c10", "c11"]
con_list3 = ["c1", "c2", "c3","c4","c5","c6","c7","c8", "c9","c10"]
regex = re.compile(f'^((?:{"|".join(map(re.escape, con_list1))})(?:{"|".join(map(re.escape, con_list2))})(?:{"|".join(map(re.escape, con_list3))}))_aa_')

rootdir = "YOUR_ROOT_DIR"
for root, dirs, files in os.walk(rootdir):
    for file in files:
        if regex.search(file):
            os.rename(file, regex.sub(r'\g<1>_zz_', file))

Note: os.walk() searches in all subdirs recursively, if you do not need that behavior, see Non-recursive os.walk().

This is not the most efficient way to create a dynamic pattern (a regex TRIE would be better), but it shows a viable approach. The regex will look like

^((?:c1|c2|c3|c4|c5|c6|c7|c8)(?:c1|c2|c3|c4|c5|c6|c7|c8|c9|c10|c11)(?:c1|c2|c3|c4|c5|c6|c7|c8|c9|c10))_aa_

See the regex demo. Note that each item in your condition lists is re.escaped to make sure special chars do not prevent your file names from matching.

Details:

^ - start of string
((?:c1|c2|c3|c4|c5|c6|c7|c8)(?:c1|c2|c3|c4|c5|c6|c7|c8|c9|c10|c11)(?:c1|c2|c3|c4|c5|c6|c7|c8|c9|c10)) - Group 1 (\g<1> refers to this group value, if _zz_ is not a placeholder for text starting with a digit, you can even use \1 instead): a value from con_list1, then a value from con_list2 and then a value from con_list3
_aa_ - an _aa_ fixed string.

score 0 · Answer 4 · answered Nov 02 '22 at 08:44

Assuming the files to rename exist in the current directory, would you please try the following:

import os, re
for f in os.listdir('.'):
    m = re.match(r'((?:c\d{1,2}){3})_aa_(\d{1,2})$', f)
    if m:
        newname = m.group(1) + '_zz_' + m.group(2)
        os.rename(f, newname)

((?:c\d{1,2}){3}) matches three repetitions of the set of c + one or two digits.
(\d{1,2}) matches one or two digits.
As the regexes above are enclosed by parentheses, the matched substrings are captured by m.group(1) and m.group(2) individually.

Using regex to find files with a particular pattern

4 Answers4