0

how I could use PHP dom screen scraping to pull the contents of an HTML tag called

<li style="margin-top:10px">

positioned in one of my pages?

I want to get all the contents of the <li> tag and display it as html code.

Ross
  • 18,117
  • 7
  • 44
  • 64
Callum Whyte
  • 2,379
  • 11
  • 36
  • 55
  • I find using preg_match to be sufficient for scraping. Also the html doesn't have to be welformed xml. – Gerben Jun 11 '11 at 20:26
  • 2
    @Gerben: Please.. never suggest using regexs for html parsing again :/ http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – ThiefMaster Jun 11 '11 at 20:38
  • 2
    LOL the Regex Enforcement Agency arrived in only a minute this time. – Michael Berkowski Jun 11 '11 at 20:42
  • @ThiefMaster He doesn't want to parse the html, just extract a certain part of it. – Gerben Jun 11 '11 at 20:43

2 Answers2

4

Use simpleXML and xpath. Supposing your HTML is all stored in the string $html, this may fit your need:

// Load your html from a file
$html = $file_get_contents("/path/to/page.html");
$xml = simplexml_load_string($html);

$li = $xml->xpath("//li[@style='margin-top:10px]");
echo $li->asXML();
Michael Berkowski
  • 267,341
  • 46
  • 444
  • 390
1
$html='<li style="margin-top:10px">hello <b>World</b></li>';
if( preg_match('|<li style="margin-top:10px">(.*?)</li>|', $html, $matches) )
{
  $licontent = $matches[1];
}
Gerben
  • 16,747
  • 6
  • 37
  • 56