Php regular expression to match a div

Question

This is mycode

<?php

/**
 * @author Joomlacoders
 * @copyright 2010
 */
    $url="http://urlchecker.net/html/demo.html";

    $innerHtml=file_get_contents($url);

    //echo $innerHtml;
    preg_match_all("{\<div id='news-id-.*d'\>(.*)\</div\>}",$innerHtml,$matches);

          //<div id='news-id-160346'>            

    var_dump($matches);

?>

I want find all content in div id='news-id-160346'. Please help me

score 6 · Accepted Answer · edited May 23 '17 at 12:00

Use an HTML parser. NOT regular expressions.

The problem with regular expressions is that they cannot match nested structures. Assuming your regex must match a single <div> and its closing tag, there is no way to correctly match this input:

<div id="a">
    <div id="b">
        Foo
    </div>
</div>
<div id="c">
    Bar
</div>

Because if your regular expression is greedy, it will match the two uppermost divs, and if it's ungreedy, it will not match the correct end tag.

Therefore, you should use an HTML parser. With PHP, DOMDocument::loadHTML or DOMDocument::loadHTMLFile each do a fairly good job. (You may "safely" ignore the warnings it generates: they're only markup errors, and the generated DOMDocument object should be pretty much okay.)

Since the PHP getElementById is a pain to get to work, you can use DOMXpath for the same purpose:

<?php

$url = "http://urlchecker.net/html/demo.html";

$d = new DOMDocument();
$d->loadHTMLFile($url);

$xpath = new DOMXPath($d);
$myNews = $xpath->query('//@id="news-id-160346"')->item(0);

?>

Hello I had try all answer but not successful Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Unexpected end tag : a in http://urlchecker.net/html/demo.html, line: 26 in /home/urlcheck/public_html/html/test.php on line 10 — Thoman, Jun 01 '10 at 05:59
@Thoman: it's actually been successful. loadHTMLFile simply tells you the problems it encountered while parsing. You can shut it up with the `@` operator: `@$d->loadHTMLFile($url);` — zneak, Jun 01 '10 at 06:26
I try it but this code don't matches all content in id='news-id-160346' — Thoman, Jun 01 '10 at 07:13

Amarghosh · Answer 2 · 2010-06-01T06:01:57.937

0

Use a parser as others suggested.

Or try this regex:

preg_match_all("#<div [^>]*id=['\"]news-id-\\d+['\"](.*?)</div>#", $innerHtml, $matches);
print_r($matches);

Check the output of the print_r statement to understand why regex is not considered as the right tool for parsing html.

edited Jun 01 '10 at 06:01

answered Jun 01 '10 at 05:09

Amarghosh

58,710
11
92
121

@Thoman Read my last line again. It won't match - that is the whole point - it can't be fixed. – Amarghosh Jun 01 '10 at 05:50

Php regular expression to match a div

2 Answers2

Linked