1

I'm learning RegEx in Python 3 and using parenthesis to extract groups is giving me an unexpected behavior that I couldn't find explained anywhere.

This is the code:

str = '<b>bold</b>'
match = re.search(r'>(\w+?)<', str)
match.group() == '>bold<'

I've tried the following variations

match = re.search(r'>(.+?)<', str)
match = re.search(r'>(.+)<', str)
match = re.search(r'>(,)<', str)
match = re.search(r'>([\w]+)<', str)

and they all return the same string. As far as I know it should just return 'bold'. Can someone explain what am I doing wrong?

Thank you!

  • 1
    *do not* name strings `str`, as it overrides a built in type. – user3483203 May 05 '18 at 07:36
  • Use match.group(1). Seems like when you use group() without any arguments it is matching the whole pattern and not the capture group that you specified using (\w+?). The whole pattern gets matched in search, which is why you are seeing `'>bold<'`. That's the pattern you looked for, not the capture group. – Jack Homan May 05 '18 at 07:36
  • Yep, this looks weird... few off-top recommendations for you: don't use `str` as variable name, as `str()` is Python built-in function and don't ever use regex to parse HTML – Andersson May 05 '18 at 07:36
  • Jack Homan is right, thank you very much! I'll take notes on the str variable name, although it doesn't override a built in type as that is followed and preceeded by two underlines in python3 – rand0MPrecisi0n May 05 '18 at 07:42
  • Try `str = 7 print(str(7))` to see that it does indeed overwrite your built-in function. – Mr. T May 05 '18 at 07:56
  • @rand0MPrecisi0n. Your statement about str overriding a built-in type is actually not correct. str is a class that can be used to cast objects as strings amount other things. If you try to get the string representation of any object ob using str(ob) you will get a TypeError that says something along the lines of 'string object is not callable'. `__str__` is an overridable method of any class that is called using str(), it isn't the type. Just go into the interpreter and type `dir(__builtins__)` and you'll see str in there. – Jack Homan May 05 '18 at 08:51
  • Thanks for the clarification. – rand0MPrecisi0n May 05 '18 at 17:42

0 Answers0