0

I'm using the following regular expression in a web scraping program. It's scraping the html for a bulleted list but it's only grabbing the first bullet and leaving the other 9 behind.

How could I modify it to grab all 10 bullets?

<li>\s*<span\s+class=\"a-list-item\">(.*?)<\/span>\s*<\/li>

Thank you for any help.

Damiens
  • 11
  • 2
  • Which lang are you runing? – Avinash Raj Dec 05 '14 at 18:06
  • How are you running that regex? are you running that from win, linux, etc? some additional info will help. – Marco Dec 05 '14 at 18:09
  • possible duplicate of [repeating multiple characters regex](http://stackoverflow.com/questions/3630982/repeating-multiple-characters-regex) – Robert P Dec 05 '14 at 18:18
  • Welcome to Stack Overflow! This is a pretty well formatted question, but there are others like it. [Consider looking at this answer](http://stackoverflow.com/questions/3630982/repeating-multiple-characters-regex) for additional information on doing repeated regexes. – Robert P Dec 05 '14 at 18:21
  • [Don't parse HTML with regex!](http://stackoverflow.com/a/1732454/418066) – Biffen Dec 05 '14 at 21:35

1 Answers1

2

With regular expressions, you can require a pattern be repeated a specific number of times with the {} characters. You can have as many groups as you want. So, you could do:

(<li>\s*<span\s+class=\"a-list-item\">(.*?)<\/span>\s*<\/li>){10}

(or if you need more or less, something like:

(<li>\s*<span\s+class=\"a-list-item\">(.*?)<\/span>\s*<\/li>){1,10}

(This answer assumes the rest of your string happens to be a legal regex for your regex interpreter. Modify as appropriate if not.)

Robert P
  • 15,707
  • 10
  • 68
  • 112