Python Regular Expressions - Limit Results?

Question

I feel kind of stupid asking this but I have made a few regular expressions to find specific businesses, addresses, and URLs in an HTML document. The problem is...I don't know which (python) regular expression thing I should use. When I use re.findall, I get 30 to 90 results. I want to limit it to 3 or maybe 5 (one set number). Which regex operation should I use to do this, or is there a parameter that can stop the search when it has reached a certain number of results?

Also, is there a faster way of searching an HTML document so that my program isn't slowed down with regular expressions searching this really long "string" of text?

Thanks.

EDIT

I have Beautiful Soup and I've used it to just make things easier to read...but not to parse.

I've also used lxml...which is better/faster?

my bad for posting an answer, you [shouldn't parse HTML with regex](http://stackoverflow.com/a/1732454/1219006), use a parser — jamylak, Aug 10 '12 at 13:14
what about get html page and read it line by line with regexp, if i understand you correctly you read whole html page? correct me if i wrong, i can describe how i parse the page with regexp if you need. — Ishikawa Yoshi, Aug 10 '12 at 13:26

score 1 · Accepted Answer · answered Aug 11 '12 at 01:11

1

Instead of using re.findall, use re.finditer. It returns an iterator which yields the next match on demand.

Here's an example:

>>> [m.group(0) for m, _ in zip(re.finditer(r"\w", "abcdef"), range(3))]
['a', 'b', 'c']

answered Aug 11 '12 at 01:11

MRAB

20,356
6
40
33

Python Regular Expressions - Limit Results?

1 Answers1

Linked