Multiple regex substitutions

Question

I am using the following code to normalize a file's name:

new_file = re.sub('[. ]', '_', old_file.lower())
new_file = re.sub('__+', '_', new_file)
new_file = re.sub('[][)(}{]',  '', new_file)
new_file = re.sub('[-_]([^-_]+)$',  r'.\1', new_file)

My question is there a possibility to write this code in a better way?

I found the following example:

def multiple_replace(dict, text):
    # Create a regular expression  from the dictionary keys
    regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

    # For each match, look-up corresponding value in dictionary
    return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text) 

dict = {
    "Larry Wall" : "Guido van Rossum",
    "creator" : "Benevolent Dictator for Life",
    "Perl" : "Python",
}

But this code only works with normal strings. The map(re.escape, ... in line 3 "destroys" the Regex.

Regards,
Ray

Have you tried simply removing the `re.escape`, i.e. changing the offending line to `regex = re.compile("(%s)" % "|".join(dict))`? I haven't tried it, but I see no reason why it shouldn't work. — user4815162342, Dec 23 '14 at 07:50
I tried this already. In this case there is an error in line 5. Futhermore in my example the sequence is important and I figured out that the a dictionary messes up the sequence. — Ray, Dec 23 '14 at 07:56

score 3 · Accepted Answer · answered Dec 23 '14 at 08:20

3

If you're simply looking for more maintainable and less repetitive code (as opposed to algorithmic change), go with a simple for loop:

SUBS = [
  ('[. ]', '_'),
  ('__+', '_'),
  ('[][)(}{]',  ''),
  ('[-_]([^-_]+)$',  r'.\1'),
]

def normalize(name):
    name = name.lower()
    for pattern, replacement in SUBS:
        name = re.sub(pattern, replacement, name)
    return name

answered Dec 23 '14 at 08:20

user4815162342

141,790
18
296
355

I would encourage OP to merge `('[. ]', '_')` and `('__+', '_')` into `('[. _]+', '_')` which is effectively the same without becoming a horribly unreadable regex. – asontu Dec 23 '14 at 08:47
@user4815162342: Thanks for your solution. I like it and I will use it in my code... – Ray Dec 23 '14 at 10:20

Multiple regex substitutions

1 Answers1