2

Possible Duplicate:
Regular expression for grabbing the href attribute of an A element

This displays the what is between the a tag, but I would like a way to get the href contents as well.

Is there a way to do that using the domDocument?

$html = file_get_contents($uri);
$html = utf8_decode($html);

/*** a new dom object ***/
$dom = new domDocument;

/*** load the html into the object ***/
@$dom->loadHTML($html);

/*** discard white space ***/
$dom->preserveWhiteSpace = false;

/*** the table by its tag name ***/
$tables = $dom->getElementsByTagName('table');

/*** get all rows from the table ***/
$rows = $tables->item(0)->getElementsByTagName('tr');

/*** loop over the table rows ***/
foreach ($rows as $row)
{
    $a = $row->getElementsByTagName('a');
    /*** echo the values ***/
    echo $a->item(0)->nodeValue.'<br />';
    echo '<hr />';
}
Community
  • 1
  • 1
kylex
  • 14,178
  • 33
  • 114
  • 175
  • duplicate of [Regular expression for grabbing the href attribute of an A element](http://stackoverflow.com/questions/3820666/regular-expression-for-grabbing-the-href-attribute-of-an-a-element) - accepted solution uses DOM. – Gordon Mar 11 '11 at 21:13

2 Answers2

6

You're mere inches away from the answer -- you've already extracted the <a> tags inside your foreach loop. You're grabbing all of them in a DOMNodeList, so each item in that list will be an instance of DOMElement, which has a method called getAttribute.

$a->item(0)->getAttribute('href') will contain the string value of the href attribute. Tada!


It's possible that you might get an empty node list. You can work around this by checking that the first item in the list is an element.

$href = null;
$first_anchor_tag = $a->item(0);
if($first_anchor_tag instanceof DOMElement)
    $href = $first_anchor_tag->getAttribute('href');
Jacta
  • 507
  • 6
  • 17
Charles
  • 50,943
  • 13
  • 104
  • 142
  • I believe you mean `getAttributeNode('href')`? – Nathan Ostgard Mar 11 '11 at 21:20
  • No, I mean I'm confused and didn't actually read the docs (that I took ever so very long to find) for `getAttribute` and blindly assumed it did the wrong thing (as is always the case with the DOM) and it returned a node instead of the actual value of the node. Oops! This has been fixed. – Charles Mar 11 '11 at 21:22
  • Ah, `getAttribute('href')` returns a string, which means you don't need the `->nodeValue`. Better, anyway. – Nathan Ostgard Mar 11 '11 at 21:22
  • When I use this I get the following error: `Call to a member function getAttribute() on a non-object` – kylex Mar 11 '11 at 21:35
  • It's quite possible that the DOMNodeList that you got back was empty. I'll edit my post with how to deal with this. – Charles Mar 11 '11 at 21:37
0
echo $a->getAttributeNode('href')->nodeValue."<br />";
vicTROLLA
  • 1,554
  • 12
  • 15