3

Possible Duplicate:
How to parse HTML with PHP?

I need to parse a string inside a td tag. I can do this using jQuery with the following:

$("#right .olddata:first td.numeric:first").html()

If I have the HTML code in a string variable, how can I get the content of the same td?

Community
  • 1
  • 1
  • If you google "PHP Dom parser", you could just use one of those libraries – Albinoswordfish Sep 19 '11 at 20:30
  • Which one of the top 10 results in Google for "parse dom in php" wasn't sufficient? – CodeCaster Sep 19 '11 at 20:30
  • *(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662). Have a look at [phpQuery](http://code.google.com/p/phpquery/) – Gordon Sep 19 '11 at 20:54

4 Answers4

8

Simple HTML DOM

Simple HTML Dom provides an object-oriented way of accessing the html dom in php. I've used it before with alot of success, but it will choke on a large dom structure. A nice feature is the ability to manipulate the dom and save it using this oo-design. It allows you to perform selector-searches of the dom:

// Find all <div> which attribute id=foo
$ret = $html->find('div[id=foo]'); 

or:

// Find all <li> in <ul> 
foreach($html->find('ul') as $ul) 
{
       foreach($ul->find('li') as $li) 
       {
             // do something...
       }
}

// Find first <li> in first <ul> 
$e = $html->find('ul', 0)->find('li', 0);

And it allows for traversal:

echo $html->getElementById("div1")->childNodes(1)->childNodes(1)->childNodes(2)->getAttribute('id');

DOMDocument

As others have noted, you can also use the DOMDocument as well.

XPath

From my personal experience, while xpath is harder to get working, it's worth it if you're only interested in extracting info from the dom.

While not perfectly related to the info you're trying to extract, here's how I've used xpath to extract info from an xml document:

The XML:

<?xml version="1.0" encoding="utf-8"?>
<Report>
  <CampaignPerformanceReportColumns>
    <Column name="AccountName" />
    ...
    <Column name="CampaignId" />
  </CampaignPerformanceReportColumns>
  <Table>
    <Row>
      <CampaignName value="Auctions" />
      <GregorianDate value="8/11/2010" />
      ...
      <CampaignId value="60312546" />
    </Row>
    <Row>
      <CampaignName value="Auctions" />
      <GregorianDate value="8/11/2010" />
      ...
      <CampaignId value="60312546" />
    </Row>
    <Row>
      <CampaignName value="Auctions 2" />
      <GregorianDate value="8/11/2010" />
      ...
      <CampaignId value="603125467" />
    </Row>
  </Table>
</Report>

PHP:

$xml = simplexml_load_file($file);

// Get each Row
$result = $xml->xpath("Table/Row");

// Get the CampaignId of each Row
$result = $xml->xpath("//Row/CampaignId");

XPath has many more features; I'd encourage you to explore it if you need to extract alot of info from any xml-structured document.

Jonathan Beebe
  • 5,241
  • 3
  • 36
  • 42
4

You can use DOMDocument and DOMXPath.

Example (our HTML is in a string variable $html):

$doc = new DOMDocument();
$doc->loadHTML($html);
$XPath = new DOMXPath($doc);
$tr = $XPath->query('//*[@id="right"]//*[@class="olddata"][1]//td[@class="numeric"][1]');
$tr = $tr->item(0);
$trHTML = $tr->nodeValue;

Demo: http://codepad.org/XmGPgrWp

gen_Eric
  • 223,194
  • 41
  • 299
  • 337
2

You should definitely take a peek at DOMDocument->loadHTML().

$doc = new DOMDocument();
$doc->loadHTML("<html><body><p id=\"foo\">bar</p></body></html>");


$foo = $doc->getElementById('foo');
echo $foo; // Outputs 'bar'

$td = $doc->getElementsByTagName('td')->nodeValue;
echo $td; // Outputs your <td> value. In this case, nothing.
genesis
  • 50,477
  • 20
  • 96
  • 125
  • 2
    You could also try looking at [phpquery](http://code.google.com/p/phpquery/) - it's a PHP DOM parser inspired by jQuery. Never used it myself, but apparently it's as simple as: `phpQuery::newDocumentFileXHTML('my-xhtml.html')->find('p'); $ul = pq('ul');` – matthewhudson Sep 19 '11 at 21:03
0

I think you're looking for the PHP DOM extension. Alternatively, you could just match what you need using regular expressions.

Jeff
  • 6,643
  • 3
  • 25
  • 35
  • 3
    Using regular expressions to parse HTML is a bad idea. Mostly because HTML is anything but regular. – jasonbar Sep 19 '11 at 20:32
  • Trying to handle XML/HTML with regular expressions will definitely fail sooner or later. – KingCrunch Sep 19 '11 at 20:33
  • 4
    be careful what you say there, you might upset the "don't parse html with regex" brigade on SO. They're vicious – Evert Sep 19 '11 at 20:33