1

I'm trying to extract a Substring (in this case 'Danger Zone Case') but it returns an error:

>>> res = 'class=\\"market_listing_item_name\\" style=\\"color: #D2D2D2;\\">Danger Zone Case<\\/span>'
>>> item = re.search('market_listing_item_name\\.+?>(.+?)<', res).group(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

Although when I enter booth on https://regexr.com/ I get a match. What am I doing wrong?

Potheker
  • 93
  • 8
  • Can you provide the relevant input that it is trying to perform the match to? Also, keep in mind that regex works different in JS and python and regex defaults to JS regex – Kwright02 Jun 06 '21 at 22:29
  • @Kwright02 that's not the issue, you can see the [same matching with Python regex](https://pythex.org/?regex=market_listing_item_name%5C%5C.%2B%3F%3E(.%2B%3F)%3C&test_string=class%3D%5C%5C%22market_listing_item_name%5C%5C%22%20style%3D%5C%5C%22color%3A%20%23D2D2D2%3B%5C%5C%22%3EDanger%20Zone%20Case%3C%5C%5C%2Fspan%3E&ignorecase=0&multiline=0&dotall=0&verbose=0) as well – Nir Alfasi Jun 06 '21 at 22:33
  • `market_listing_item_name\\.` That is looking for the string `market_listing_item_name` followed by a literal period, which explains why it did not find a match. – John Gordon Jun 06 '21 at 22:35
  • this should give your your answer pattern = re.compile(r'(>\w* .*<)'); item = pattern.search(res); item.group(0) – Ade_1 Jun 06 '21 at 22:45

1 Answers1

3

The backslashes in 'market_listing_item_name\\.+?>(.+?)<' are treated as special characters. To tell python to treat them as literal chacracters, use raw strings: r'market_listing_item_name\\.+?>(.+?)<'.

A good tip is to always prefix strings with that r when writing regular expressions in python, it helps avoid quite a few headaches :)

Seon
  • 3,332
  • 8
  • 27
  • If you're using raw strings, then you don't need two backslashes. – John Gordon Jun 06 '21 at 22:37
  • 1
    @JohnGordon Note that there are two backslashes in the actual string we are matching. – Kraigolas Jun 06 '21 at 22:40
  • 1
    @Kraigolas Are you sure? It's not a raw string in the OP, so the first backslash is treated as an escape. Try it yourself: `print('class=\\"market_listing_item_name\\"')` yields `class=\"market_listing_item_name\"` – John Gordon Jun 06 '21 at 22:41
  • @JohnGordon After a bit of experimenting, it turns out you're correct in that `res` evaluates to having only a single backslash. However, if we remove the double backslash from `'market_listing_item_name\\.+?>(.+?)<'`, we are left with `'market_listing_item_name\.+?>(.+?)<'`. This fails, and I believe it is because in regex, that backslash is now escaping the `\.`, which means we are now matching periods instead of the `.` being a wildcard, although I remain open to any additional input you might have. – Kraigolas Jun 06 '21 at 22:48
  • Thank you, it works! Although I still don't quite understand why mine didn't work. The RegEx has an escaped backslash after "...item_name" so it should search for "market_listing_item_name\" followed by the other stuff and shouldn't that match the res string? (no matter actually whether the backslashes in the res string count as escaped or not) – Potheker Jun 07 '21 at 13:53