0

For the following generated HTML

<a href="http://google.com">google.com</a>
Static text
<a href="http://Bing.com">bing.com</a>
<a href="http://google.com">google.com</a>
Static text
<a href="http://Bing.com">bing.com</a>
<a href="http://Bing.com">bing.com</a>

I need to match(pseudo):

(Single URL) \nStatic text\n (Multiple URLs)

Like:

MATCH 1
1. 'http://google.com'
2. 'http://Bing.com'

MATCH 2
1. 'http://google.com'
2. 'http://Bing.com'
3. 'http://Bing.com'

For this example, how do I repeat a captured group?

My attempt:

(http:\/\/[a-zA-Z.-_]{1,}){1}[">.a-zA-Z<>\/]{1,}\nStatic text\n[">.a-zA-Z<>\/ =]{1,}(http:\/\/[a-zA-Z.-_]{1,}){1,}

(See also https://regex101.com/r/cJ2wJ6/5), which does not seem to do the trick. Where am I going wrong?

JasperJ
  • 1,192
  • 1
  • 12
  • 23
  • What you quote of your 'tried' regex doesn't match the anchor tag markup properly. You've also not allowed for `https` URLs, but that's trivia. You need to look hard at the material in the square brackets before the second `http:` section, and you need something along those lines after it too (you've not allowed for the newlines in the data you show either, for instance). – Jonathan Leffler Jun 23 '15 at 04:52
  • 2
    don't use regex to parse html. – Avinash Raj Jun 23 '15 at 04:53
  • `(?:CAPTURING_GROUP)*` – Downgoat Jun 23 '15 at 05:25

0 Answers0