0

I am trying to grab numeric value(ie 105) using preg_match from html page, please check my html code following...

<p>
                External Backlinks
            </p>
            <p style="font-size: 150%;">
                <b>105</b>
            </p>

And i have using following regex...

$url = 'http://www.example.com/test.html';

preg_match('#<p>External Backlinks</p><p style="font-size: 150%;"><b>([0-9\.]+)#', file_get_contents($url), $matches);

echo $matches[1];

But its not returning correct value, please help to fix up the above regex. thanks.

seoppc
  • 2,766
  • 7
  • 44
  • 76

1 Answers1

0

I don't recommend using regex to parse HTML. Use a DOM parser instead. Read this rant for more information about why :)

To answer your question. Here's a working regex for your example:

<p>[^E]*External Backlinks[^<]*<\/p>[^<]*<p style="font-size: ?150%;">[^<]*<b>(\d+)<\/b>[^<]*<\/p>

It's ugly, but it works... Don't use it.

preg_match('#<p>[^E]*External Backlinks[^<]*<\/p>[^<]*<p style="font-size: ?150%;">[^<]*<b>(\d+)<\/b>[^<]*<\/p>#', file_get_contents($url), $matches);

echo $matches[1];

Output:

105

The problem with your regex was that it didn't account for the whitespaces in the HTML source, and you didn't escape your slashes.

If the source looked something like this:

<p>External Backlinks</p><p style="font-size: 150%;"><b>105</b></p>

Yours would have worked, however not very robust. (Tho I guess one could argue using regex to parse HTML is never very robust.)

Community
  • 1
  • 1
ohaal
  • 5,208
  • 2
  • 34
  • 53