re.findall() sometimes returning all sometimes returning last

Asked Feb 08 '20 at 22:29

Active Feb 08 '20 at 22:29

Viewed 24 times

This returns only the last occurrence in data

re.findall('data-hovercard-id="[\w\W]*"><span>([\w\W]*)</span></a',data)

Whereas this returns all occurrences in the data

re.findall('class="business-attribute[\s]price-range">([\W]*)</',data)

I can't share the data for privacy reasons, but this is scraping and old static html of yelp. Not sure if the structure is still the same.

Any ideas? I'm trying to get all occurrences for both

asked Feb 08 '20 at 22:29

Jamalan

1

`"[\w\W]*"` needs to be non-greedy i.e. `"[\w\W]*?"` or better yet use `"[^"]*"` – Nick Feb 08 '20 at 22:32
1

Given that these are two different regexes and you can't show us the actual content it is matching against, at best we can speculate. E.g. one general guess is maybe your html elements are on multiple lines and your first regex doesn't account for that. On that note.. you should be using a DOM Parser not regex for this sort of thing. – CrayonViolent Feb 08 '20 at 22:34
Nick seems to be right, but just in case the linked question doesn't help, please ask a new question and include a [mre]. BTW isn't `[\w\W]` the same as `.`? – wjandrea Feb 08 '20 at 22:54

0 Answers0