1

How to simplify following regular expression

re.search(f'\W{word}\W', text) or re.search(f'^{word}\W', text) or re.search(f'\W{word}$', text) or word == text

i.e. return True for any string that contains word with \W or ^ before and \W or $ after.

Variants

  1. re.search(f'[^\W]{word}[\W$]', text)
  2. re.search(f'[\W^]{word}[\W$]', text)

dont work for my case.

Expression re.search(f'\W*{word}\W*', text) gives wrong matches, for example word + 'A'.

Any suggestions? Thank you!

Barmar
  • 741,623
  • 53
  • 500
  • 612
Che4ako
  • 13
  • 3
  • 1
    I think you're just looking for the `\b` pattern for a word boundary. – Barmar May 03 '23 at 16:16
  • `$` doesn't have special meaning inside `[]`. And `^` at the beginning of `[]` means to invert the match -- it matches anything *except* those characters. – Barmar May 03 '23 at 16:19
  • Why did you use `*` quantifiers after `\W`? That allows it to match nothing. – Barmar May 03 '23 at 16:20
  • [This answer](https://stackoverflow.com/a/6664167/5527985) is explaining word boundaries well. – bobble bubble May 04 '23 at 00:21
  • re.search(f"[\b\W]{word}[\b\W]", text) does not match cases in the title of question.The problem is not in word boundaries. Problem is that word can be at the start of string, but pattern \W{word}\W requires symbol before word. – Che4ako May 04 '23 at 06:59
  • @Che4ako The pattern is [`\bword\b`](https://regex101.com/r/oKXvtP/1) so you would use [`re.search(fr'\b{word}\b', text)`](https://tio.run/##JY2xCoAwDET3fEVwqYK4OPsnWRRbLGhSYkBF/PZKdbk73hsuXbYI9znHLYkaqgcwfxoOWB2iM7Gwx3@VRDuEmHgT/fFeAXxiQFfaASSNbLX6LkSex3Wtgzqa7mIfmlyL5aBpcn4B) a [word boundary](https://www.regular-expressions.info/wordboundaries.html) is an *anchor* (zero length) it matches at a position. – bobble bubble May 04 '23 at 10:20

1 Answers1

0

There is no simple way to make ^ or $ optional patterns in Python's regexps.

I think the easiest way will be to concatenate the three regexps, but using the | operator inside the expression instead of the external or with 3 .search calls:

word = re.escape(ticker.lower())
result = re.search(f"(^{word}\W)|(\W{word}\W)|(\W{word}$)"`
jsbueno
  • 99,910
  • 10
  • 151
  • 209