0

In a large HTML document, I have multiple lines that look like this. The value 'TEST' can be different. I want to pick up TEST or whatever else is in its place.

<TD width=300 valign=top><FONT COLOR=800000 size=3>TEST</FONT><BR>

I have this regex:

$regex = "/<FONT COLOR=800000 size=3>[\w.&,\s]*<\/FONT>/";

It picks up all the lines that look like the one I posted above. How can I, instead of the entire line, pick up only TEST.

Ayush
  • 41,754
  • 51
  • 164
  • 239

3 Answers3

2
$regex = "/<FONT COLOR=800000 size=3>([\w.&,\s]*)<\/FONT>/";
preg_match($regex, $string, $matches);

you will have all matches in $matches array, $matches[1] should be your "TEST".

boobiq
  • 2,984
  • 2
  • 20
  • 27
  • and if I want to use `preg_match_all`? – Ayush Jan 06 '12 at 13:41
  • just you `preg_match_all` instead of `preg_match`, rest is the same .. you can see the contents of `$matches` by printing it for example with `print_r($matches)` or `var_dump($matches)` – boobiq Jan 06 '12 at 13:45
  • 1
    `preg_match_all` will give `$matches` in the same format as `preg_match`, but wrapped in an additional array (since it returns multiple results). So you can access your TEST with `$matches[$i][1]`, for `0 <= $i < count($matches)`. – Amadan Jan 06 '12 at 13:51
1

First off, obligatory link.

If you really want to regexp it, put parentheses around [\w.&, ]*] and capture the content into a group, then read off the group instead of the whole match. EDIT I see @boobiq shows you how exactly to do this, so I'm not gonna. :p

Community
  • 1
  • 1
Amadan
  • 191,408
  • 23
  • 240
  • 301
1

You might want to use groups, thus changing your regex to this:

"/<FONT COLOR=800000 size=3>([\w.&, ]*)<\/FONT>/"

The round brackets denote groups.

Ideally you should never parse HTML with regex (why not?)... you should use a framework like the PHP Simple HTML DOM Parser.

Community
  • 1
  • 1
npinti
  • 51,780
  • 5
  • 72
  • 96
  • I was staying away from DOM parsers since I didn't want to make the assumption of valid HTML. However, your link says it supports invalid HTMl as well. I'll definitely have a look at it. – Ayush Jan 06 '12 at 13:42
  • I'm loving the DOM Parser! Ditched regex for it. – Ayush Jan 06 '12 at 14:14
  • @xbonez: Then why not select this as the answer? – npinti Jan 06 '12 at 14:15