2

I am using the following code to normalize a file's name:

new_file = re.sub('[. ]', '_', old_file.lower())
new_file = re.sub('__+', '_', new_file)
new_file = re.sub('[][)(}{]',  '', new_file)
new_file = re.sub('[-_]([^-_]+)$',  r'.\1', new_file)

My question is there a possibility to write this code in a better way?

I found the following example:

def multiple_replace(dict, text):
    # Create a regular expression  from the dictionary keys
    regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

    # For each match, look-up corresponding value in dictionary
    return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text) 

dict = {
    "Larry Wall" : "Guido van Rossum",
    "creator" : "Benevolent Dictator for Life",
    "Perl" : "Python",
} 

But this code only works with normal strings. The map(re.escape, ... in line 3 "destroys" the Regex.

Regards,
Ray

Community
  • 1
  • 1
Ray
  • 99
  • 1
  • 7
  • Have you tried simply removing the `re.escape`, i.e. changing the offending line to `regex = re.compile("(%s)" % "|".join(dict))`? I haven't tried it, but I see no reason why it shouldn't work. – user4815162342 Dec 23 '14 at 07:50
  • I tried this already. In this case there is an error in line 5. Futhermore in my example the sequence is important and I figured out that the a dictionary messes up the sequence. – Ray Dec 23 '14 at 07:56

1 Answers1

3

If you're simply looking for more maintainable and less repetitive code (as opposed to algorithmic change), go with a simple for loop:

SUBS = [
  ('[. ]', '_'),
  ('__+', '_'),
  ('[][)(}{]',  ''),
  ('[-_]([^-_]+)$',  r'.\1'),
]

def normalize(name):
    name = name.lower()
    for pattern, replacement in SUBS:
        name = re.sub(pattern, replacement, name)
    return name
user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • I would encourage OP to merge `('[. ]', '_')` and `('__+', '_')` into `('[. _]+', '_')` which is effectively the same without becoming a horribly unreadable regex. – asontu Dec 23 '14 at 08:47
  • @user4815162342: Thanks for your solution. I like it and I will use it in my code... – Ray Dec 23 '14 at 10:20