0

I am trying to parse a tabular structured data from a html page..

It has the following structure

<table>
<tr>
  <td><a href="url">name</a></td>
    <td>Lakeside</td>
    <td>California</td>
    <td>92040</td>
    <td>United States</td>
    <td>Off Road</td>
</tr>
</table>

I am trying this,

$dom = new DOMDocument;
$url = "url";

@$dom->loadHTMLFile($url);

$xpath = new DOMXpath($dom);
$xNodes = $xpath->query("//td");


foreach ($xNodes as $xNode)
{
    $sLinktext = @$xNode->firstChild->data;
 echo $sLinktext."<br/>";
    $sLinkurl = $xNode->getAttribute('a');


     if ($sLinktext != '' && $sLinkurl != '')
    {
        echo '<li><a href="' . $sLinkurl . '">' . $sLinktext . '</a></li>';
   } 
}

It is returning only td data but it is not showing me the a href data..

UPDATE
How can go to next level, I mean, i have an anchor tag. If i click on it i will be redirected to another page, and that page contains the detailed information of a place. I just want to get that information to be parsed along with these details. Is it possible to Parse multiple pages at one go

Please guide me. Thank you

Ramaraju.d
  • 1,301
  • 6
  • 26
  • 46
  • you are using `$xNode->getAttribute('a')` but the `$xNode` seems to be a `td` node, and it has not any `a` attribute. You have to retrieve the `href` attribute on the child of your `$xNode`. – MatRt May 06 '13 at 06:48
  • 2
    For one, `a` shouldn't be an attribute of a `td`. You might want to epand your XPath query into something like `//td/a` or somethig more suitable, depending on the structure you're dealing with. – Havelock May 06 '13 at 06:48
  • If i keep td, It displays same result. – Ramaraju.d May 06 '13 at 06:50
  • You can also take a look at [phpquery](http://code.google.com/p/phpquery/), what makes this very flexible and easy. – Sammy May 06 '13 at 06:51
  • are you looking for just the text in every td? or are you trying to analyze the links? – bjelli May 06 '13 at 07:07
  • only text from every tr and td. i just want to save the returned result into my database – Ramaraju.d May 06 '13 at 07:10
  • if you only want the text then getElementsByTagName() the TR elements and fetch their nodeValue. That will return all text nodes below the TR element. – Gordon May 06 '13 at 07:13

0 Answers0