6

I'm wondering if there's any way to combine patterns with re.sub() instead of using multiples like below:

import re
s1 = "Please check with the store to confirm holiday hours."
s2 = ''' Hours:
            Monday: 9:30am - 6:00pm
Tuesday: 9:30am - 6:00pm
Wednesday: 9:30am - 6:00pm
Thursday: 9:30am - 6:00pm
Friday: 9:30am - 9:00pm
Saturday: 9:30am - 6:00pm
Sunday: 11:00am - 6:00pm

Please check with the store to confirm holiday hours.'''

strip1 = re.sub(s1, '', s2)
strip2 = re.sub('\t', '', strip1)
print(strip2)

Desired output:

Hours:
Monday: 9:30am - 6:00pm
Tuesday: 9:30am - 6:00pm
Wednesday: 9:30am - 6:00pm
Thursday: 9:30am - 6:00pm
Friday: 9:30am - 9:00pm
Saturday: 9:30am - 6:00pm
Sunday: 11:00am - 6:00pm
David Metcalfe
  • 2,237
  • 1
  • 31
  • 44
  • 1
    If you want to use `s1` as literal regex, you should be calling `re.escape` on it to prevent random characters in it from being interpreted as regex special characters and/or making it a raw string literal with an `r` prefix, e.g. `r'Please check ...'`. If you want to remove each component word, you'd have to split it up and replace each part. – ShadowRanger Nov 11 '15 at 01:20

2 Answers2

12

If you're just trying to delete specific substrings, you can combine the patterns with alternation for a single pass removal:

pat1 = r"Please check with the store to confirm holiday hours."
pat2 = r'\t'
combined_pat = r'|'.join((pat1, pat2))
stripped = re.sub(combined_pat, '', s2)

It's more complicated if the "patterns" use actual regex special characters (because then you need to worry about wrapping them to ensure the alternation breaks at the right places), but for simple fixed patterns, it's simple.

If you had real regexes, rather than fixed patterns, you might do something like:

all_pats = [...]
combined_pat = r'|'.join(map(r'(?:{})'.format, all_pats))

so any regex specials remain grouped without possibly "bleeding" across an alternation.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • It seems your answer most accurately addresses the Regex portion of my question, but I'm confused on **pat2** as I thought `r' '` would treat the string as raw, so my `\t` Tab would break. Am I confused here? – David Metcalfe Nov 11 '15 at 01:35
  • 2
    `r'\t'` and `'\t'` happen to work the same by coincidence. The latter is looking for the literal byte representing a tab, the former is looking for the regex pattern `\t` that, as it happens, looks for a tab. It's the same end result. I'm just OCD about using raw strings; `r'\n'` and `r'\t'` work fine raw or non-raw, but if you search for `'\b'` instead of `r'\b'` (for example), you're looking for an ASCII backspace, not a word boundary, and you almost never wanted the former. – ShadowRanger Nov 11 '15 at 01:42
7

You're not even using regular expressions so you may as well just chain replace:

s1 = "Please check with the store to confirm holiday hours."
s2 = ''' Hours:
            Monday: 9:30am - 6:00pm
Tuesday: 9:30am - 6:00pm
Wednesday: 9:30am - 6:00pm
Thursday: 9:30am - 6:00pm
Friday: 9:30am - 9:00pm
Saturday: 9:30am - 6:00pm
Sunday: 11:00am - 6:00pm

Please check with the store to confirm holiday hours.'''

strip2 = s2.replace(s1, "").replace("Hours:","").strip()

print(strip2)
Jack
  • 20,735
  • 11
  • 48
  • 48
  • Ah, chaining. Hadn't thought of that. Can `re.sub()` be chained as well then, for when I am using actually regex expressions? – David Metcalfe Nov 11 '15 at 01:22
  • 1
    @DavidMetcalfe No because `re.sub` returns a string which doesn't have `sub`. You could nest them, but that would get ugly fast. – Jack Nov 11 '15 at 01:28