Parse this html table regex

Question

I have two related questions:

I have the following html text:

<td style="work" class="sort"> 1500 </td>

I would like to parse the 1500, preferably with no spaces around it, although I could always trim after.

I'm testing on regex101, and here is what I have so far:

>.*?<\/td>

It returns to me back:

> 1500 </td>, which is close to what I want. I mean I can just manually parse this after, but I would prefer to only have 1500 back.

Second question:

If I have the following html text:

<td style="work"> <a class="link" href="/img"> Lake </a> </td>

How can I parse this to get back Lake? If I use the regex >.*?<\/a>, I would get back

> <a class="link" href="/img"> Lake </a>, which is more than I want.

I dont understand the downvote. I asked a reasonable question, provided my thoughts, and illustrated the problem — K Split X, Mar 18 '20 at 14:49
It's just match in javascript. `x = ' 1500 '` `x.match(/>.*?<\/td>/g)` — K Split X, Mar 18 '20 at 14:57

Dum · Answer 1 · 2020-03-18T16:25:42.443

0

Use Parentheses for Grouping and Capturing

First check for pattern >(.*?)<\/a> . If that doesn't match, check for >(.*?)<\/td>

However it doesn't recommend to parse HTML using Regex. Read about it here.

Edit: MDR's solution (^.*?> (\w+) <.*?$) works if you want only to capture word charactors.

edited Mar 18 '20 at 16:25

answered Mar 18 '20 at 16:08

Dum

1 Answers1