Regex html tag data

Question

I'm getting (HTTP request) and I'm trying to get certain data out of it by using a regex, for example this part of the HTML:

<tr><th>Continent:</th><td class='trc'>Europe (EU)</td></tr>

How can I get the 'Europe (EU)' out of this?

I've tried this regex:

/<th>Continent:<\/th><td class='trc'>(.+)\s<\/td>/

But this does not work

You should not be using regexes to parse HTML. Use an HTML parser for that... — plalx, May 05 '13 at 14:02
This is for a mIRC script but I figured that regexes are the same in mIRC scripting language as in PHP? — Kaj, May 05 '13 at 14:03
@plalx depending on the intent using a full blown SGML parser to extract a single bit of data is like attacking a rubber boat with naval artillery. There are plenty use cases for preferring to simple extract a few simple bits of data from HTML with regular expressions over a full blown parser. It's often even more resilient too since the regex method will survive minor changes in the source page structure. — Niels Keurentjes, May 05 '13 at 14:23

melwil · Accepted Answer · 2013-05-05T14:27:07.807

2

You are telling the regex to look for a space followed by </td>

/<th>Continent:<\/th><td class='trc'>(.+)\s<\/td>/  
                                         ^^

I'd recommend using [^<>]+ to search for text between html tags.

/<th>Continent:<\/th><td class='trc'>([^<>]+)<\/td>/

edited May 05 '13 at 14:27

answered May 05 '13 at 14:20

melwil

1 Answers1