0

I'm getting (HTTP request) and I'm trying to get certain data out of it by using a regex, for example this part of the HTML:

<tr><th>Continent:</th><td class='trc'>Europe (EU)</td></tr>

How can I get the 'Europe (EU)' out of this?

I've tried this regex:

/<th>Continent:<\/th><td class='trc'>(.+)\s<\/td>/

But this does not work

Matt Ball
  • 354,903
  • 100
  • 647
  • 710
Kaj
  • 2,445
  • 3
  • 23
  • 34
  • 4
    You should not be using regexes to parse HTML. Use an HTML parser for that... – plalx May 05 '13 at 14:02
  • http://stackoverflow.com/q/1732348/139010 – Matt Ball May 05 '13 at 14:03
  • This is for a mIRC script but I figured that regexes are the same in mIRC scripting language as in PHP? – Kaj May 05 '13 at 14:03
  • @plalx depending on the intent using a full blown SGML parser to extract a single bit of data is like attacking a rubber boat with naval artillery. There are plenty use cases for preferring to simple extract a few simple bits of data from HTML with regular expressions over a full blown parser. It's often even more resilient too since the regex method will survive minor changes in the source page structure. – Niels Keurentjes May 05 '13 at 14:23

1 Answers1

2

You are telling the regex to look for a space followed by </td>

/<th>Continent:<\/th><td class='trc'>(.+)\s<\/td>/  
                                         ^^

I'd recommend using [^<>]+ to search for text between html tags.

/<th>Continent:<\/th><td class='trc'>([^<>]+)<\/td>/
melwil
  • 2,547
  • 1
  • 19
  • 34