-1

I have two related questions:

I have the following html text:

<td style="work" class="sort"> 1500 </td>

I would like to parse the 1500, preferably with no spaces around it, although I could always trim after.

I'm testing on regex101, and here is what I have so far:

>.*?<\/td>

It returns to me back:

> 1500 </td>, which is close to what I want. I mean I can just manually parse this after, but I would prefer to only have 1500 back.


Second question:

If I have the following html text:

<td style="work"> <a class="link" href="/img"> Lake </a> </td>

How can I parse this to get back Lake? If I use the regex >.*?<\/a>, I would get back

> <a class="link" href="/img"> Lake </a>, which is more than I want.

K Split X
  • 3,405
  • 7
  • 24
  • 49

1 Answers1

0

Use Parentheses for Grouping and Capturing

First check for pattern >(.*?)<\/a> . If that doesn't match, check for >(.*?)<\/td>

However it doesn't recommend to parse HTML using Regex. Read about it here.

Edit: MDR's solution (^.*?> (\w+) <.*?$) works if you want only to capture word charactors.

Dum
  • 1,431
  • 2
  • 9
  • 23