0

I am trying to match words using python re findall or finditer method for that matter.

import re

re.compile(r"\bSOMETHING\b").findall('this is the SOMETHING i am looking for')
>>> ['SOMETHING'] # i expect this outcome

re.compile(r"\bSOMETHING\b").findall('this is the SOMETHINGELSE i am looking for')
>>> [] # i expect this outcome

>>> re.compile(r"\bSOMETHING\b").findall('this is the #SOMETHING i am looking for')
['SOMETHING'] # i don't expect this outcome but []

>>> re.compile(r"\b#SOMETHING\b").findall('this is the #SOMETHING i am looking for')
[] # i expect this outcome but ['#SOMETHING']

I guess i am not understanding why adding # will mess up the whole find process. I am not sure how to get to be able to match patterns that contain # or any other especial character for that matter.

thanks

Cobry
  • 4,348
  • 8
  • 33
  • 49
  • `\b` means "word boundary", but it sounds like you and Python's regex engine disagree on the meaning of "word". – Joseph Sible-Reinstate Monica Jun 11 '19 at 21:02
  • yeah probably we are having a disagreement :) but when I compile "\b#SOMETHING\b" I expect any seperate #SOMETHING to be returned. am I correct ? – Cobry Jun 11 '19 at 21:04
  • The answer is as always: `\b` before a non-word char (like `#`) only matches if preceded with a word char. `\b#SOMETHING\b` will match in `a#SOMETHING` but never in `#SOMETHING` or `abc #SOMETHING`. Study [What is a word boundary in regexes?](https://stackoverflow.com/questions/1324676/what-is-a-word-boundary-in-regexes) and do not let yourself get confused with the term "word": it is way too ambiguous term. – Wiktor Stribiżew Jun 11 '19 at 21:05
  • Use `#\b...` instead of `\b#...` – Rick Hitchcock Jun 11 '19 at 21:09
  • Got it thanks ! #\b... works perfectly. thank you all for your explanation – Cobry Jun 11 '19 at 21:20

0 Answers0