preg_match issue

Question

I am trying to grab numeric value(ie 105) using preg_match from html page, please check my html code following...

<p>
                External Backlinks
            </p>
            <p style="font-size: 150%;">
                <b>105</b>
            </p>

And i have using following regex...

$url = 'http://www.example.com/test.html';

preg_match('#<p>External Backlinks</p><p style="font-size: 150%;"><b>([0-9\.]+)#', file_get_contents($url), $matches);

echo $matches[1];

But its not returning correct value, please help to fix up the above regex. thanks.

For HTML, don't use *regex*, use *xpath*. Xpath are "regular" expressions for HTML/XML, e.g. `//p[@style="font-size: 150%;"]/b`. — hakre, Feb 21 '12 at 22:04

score 0 · Accepted Answer · edited May 23 '17 at 11:48

I don't recommend using regex to parse HTML. Use a DOM parser instead. Read this rant for more information about why :)

To answer your question. Here's a working regex for your example:

<p>[^E]*External Backlinks[^<]*<\/p>[^<]*<p style="font-size: ?150%;">[^<]*<b>(\d+)<\/b>[^<]*<\/p>

It's ugly, but it works... Don't use it.

preg_match('#<p>[^E]*External Backlinks[^<]*<\/p>[^<]*<p style="font-size: ?150%;">[^<]*<b>(\d+)<\/b>[^<]*<\/p>#', file_get_contents($url), $matches);

echo $matches[1];

Output:

The problem with your regex was that it didn't account for the whitespaces in the HTML source, and you didn't escape your slashes.

If the source looked something like this:

<p>External Backlinks</p><p style="font-size: 150%;"><b>105</b></p>

Yours would have worked, however not very robust. (Tho I guess one could argue using regex to parse HTML is never very robust.)

preg_match issue

1 Answers1