Scraping
contents with PHP dom

Question

how I could use PHP dom screen scraping to pull the contents of an HTML tag called

<li style="margin-top:10px">

positioned in one of my pages?

I want to get all the contents of the <li> tag and display it as html code.

I find using preg_match to be sufficient for scraping. Also the html doesn't have to be welformed xml. — Gerben, Jun 11 '11 at 20:26
@Gerben: Please.. never suggest using regexs for html parsing again :/ http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — ThiefMaster, Jun 11 '11 at 20:38
LOL the Regex Enforcement Agency arrived in only a minute this time. — Michael Berkowski, Jun 11 '11 at 20:42
@ThiefMaster He doesn't want to parse the html, just extract a certain part of it. — Gerben, Jun 11 '11 at 20:43

Michael Berkowski · Accepted Answer · 2011-06-11T23:19:54.460

4

Use simpleXML and xpath. Supposing your HTML is all stored in the string $html, this may fit your need:

// Load your html from a file
$html = $file_get_contents("/path/to/page.html");
$xml = simplexml_load_string($html);

$li = $xml->xpath("//li[@style='margin-top:10px]");
echo $li->asXML();

edited Jun 11 '11 at 23:19

answered Jun 11 '11 at 20:27

Michael Berkowski

267,341
46
444
390

2

@Callum Whyte See addition above : `file_get_contents()` – Michael Berkowski Jun 11 '11 at 20:34

score 1 · Answer 2 · answered Jun 11 '11 at 20:38

1

$html='<li style="margin-top:10px">hello <b>World</b></li>';
if( preg_match('|<li style="margin-top:10px">(.*?)</li>|', $html, $matches) )
{
  $licontent = $matches[1];
}

answered Jun 11 '11 at 20:38

Gerben

16,747
6
37
56

1

This will actually do the job at hand. Just don't get into the habit of using regex for more complicated parsing. – Michael Berkowski Jun 11 '11 at 20:52

Scraping contents with PHP dom

2 Answers2

Scraping
contents with PHP dom