-1

I'm trying to use the regex:

<td>(.*)<\/td><td>(.*)<\/td>

To match the data from this:

<td>over079</td><td>37.123.86.116</td></tr><tr><td>1346968</td><td>rektheace</td><td></td></tr><tr><td>1346967</td><td>rektheace</td><td>173.245.67.214</td>

So I can extract each one, but it seems to be just returning:

<td>over079</td><td>37.123.86.116</td></tr><tr><td>1346968</td><td>rektheace</td><td></td></tr><tr><td>1346967</td><td>rektheace</td><td>173.245.67.214</td>

Any reason why?

JustLloyd
  • 129
  • 2
  • 8

2 Answers2

3
<td>(.*?)<\/td><td>(.*?)<\/td>

Make * non greedy

or

Use

<td>((?:(?!<\/td>).)*)<\/td><td>(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b)<\/td>

See demo.

https://regex101.com/r/aD7aH2/3

vks
  • 67,027
  • 10
  • 91
  • 124
  • Thanks! I'm trying to validate the IP address as well, but it seems to be returning every single bit of data.. (.*?)<\/td>(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b)<\/td> – JustLloyd May 02 '15 at 19:00
0

The first (.*) matches as much as it possibly can unless you tell it otherwise, so it matches all but the last <td>. Then the second (.*) matches the contents of the last <td>.

The pattern matches the whole string. The groups match:

over079</td><td>37.123.86.116</td></tr><tr><td>1346968</td><td>rektheace</td><td></td></tr><tr><td>1346967</td><td>rektheace

and

173.245.67.214

respectively.

If you want the groups to match the minimum possible, make them non-greedy with (.*?). The pattern will then match

<td>over079</td><td>37.123.86.116</td>

And the groups will match the contents of the first two <td>s

CupawnTae
  • 14,192
  • 3
  • 29
  • 60
  • Thanks for your post, when I do do this - and make it display the first group it displays most of that line for some reason. – JustLloyd May 02 '15 at 19:01