-1

Note: This question is certainly different from Does re in Python support word boundaries (/b). The alluded link seeks an answer for a very simple query for which a cursory glance on any tutorial in Python regular expression would have provided the explanation with examples. My question was using a word boundary around an OR expression and is far from trivial or to be reckoned as duplicate.

I was trying to build a palatable example to demonstrate regex word boundaries. Towards this, I wanted to show how the singular food items ordered by a diet-conscious person are changed for a guzzler and wrote the following program:

import re
items_lean = 'a masala dosa, an idli and a mango lassi'
pattern = r'{}'.format('an|a') # Use pattern as dynamic variable in regex
items_fat = re.sub(pattern, 'four', items_lean) # OOPS
print(items_fat)
pattern_fat = r'{}'.format('\ban\b|\ba\b') # Ensure a or an occurs as a word by itself
items_fat_proper = re.sub(pattern_fat, 'four', items_lean)
print(items_fat_proper)

I expected the following outputs corresponding to each print statement

four mfoursfourlfour dosfour, four idli fourd four mfourgo lfourssi
four masala dosa, four idli and four mango lassi

But, what I got was:

four mfoursfourlfour dosfour, four idli fourd four mfourgo lfourssi
a masala dosa, an idli and a mango lassi

Where should the \b factor be placed to get the desired output?

Seshadri R
  • 1,192
  • 14
  • 24
  • Why do you use `format` to create your string? Just use `r'\ban\b|\ba\b'`, that would have avoided this error. – Thierry Lathuille Nov 10 '19 at 09:50
  • Thanks. That is really simpler and less complicated too. Overambitious to use a Swiss knife, when a simple blade would have sufficed. – Seshadri R Nov 10 '19 at 09:59
  • @wiktor-stribizew Thanks for the comment. Added a prefatory note to my question to explain, why it is different from the one you referred to. – Seshadri R Nov 11 '19 at 04:00

1 Answers1

0

In order to satisfy the guzzlers you need to escape the \bs or use the raw input format i.e.

pattern_fat = r'\ban\b|\ba\b'

I've also removed the superfluous format which I suspect caused this confusion!

Joe Halliwell
  • 1,155
  • 6
  • 21