0

The XPath query below works perfectly fine using Google docs' importXML but not working using the following PHP script. If I change the query to one that is more simple, the script works as expected. I have been trying to troubleshoot this problem for quite a while and would appreciate any suggestions.

Many thanks in advance!

$file = fopen('info-urls.txt', "r");

$output = array();
$i=1;

while(!feof($file)){
    $line = fgets($file);

    echo $line . '<br/>';
    $doc = new DOMDocument();
    $doc->loadHTMLFile(trim($line));

    $xpath = new DOMXpath($doc);

    $elements = $xpath->query("substring((//*[self::div or self::p or self::li or self::td or self::tr or self::table or self::h4 or self::h4 or self::h3 or self::h2 or self::h1][contains(text(),'boat') or contains(text(),'bike') or contains(text(),'car')]/text())[1], 0, 499)");

    if ($elements->length == 0) {
      $output[] = 'N/A';
    }else{
        foreach ($elements as $element) {
            $nodes = $element->childNodes;
            foreach ($nodes as $node) {
                if(strcmp($node->nodeValue, "")!=0){
                    $output[] = trim($node->nodeValue);
                }
            }
        }
    }
}
array2csv($output);
print_r($output);

function array2csv(array &$array){
    $file = 'descriptions.txt';

    $csvFormat = "";

    for($i=0; $i < sizeof($array); $i++){
        $csvFormat .= $array[$i] . ",\n";
    }
    file_put_contents($file, $csvFormat);
}

Script description.txtoutput

N/A,
N/A,
N/A,
N/A,
N/A,

XPath query that works

//a

AnchovyLegend
  • 12,139
  • 38
  • 147
  • 231
  • 1
    Please reduce your example to a *single* HTML document (fragment) that is able to produce your error (so that the example can be reproduced, this is generally required on SO). Next thing is that you also should add the xpath query that works. – hakre Jul 04 '13 at 23:05

1 Answers1

1

Use $xpath->evaluate() instead of $xpath->query(). This is because your query will return a scalar string rather than a DOMNodeList, it will return the result of the XPath function substring() what is actually a string.

hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • Thanks for the reply. The change did not work however, I am getting the following error: ` Notice: Trying to get property of non-object in C:\xampp\htdocs\www\wect\scrape\xpathgenius.php on line 22` – AnchovyLegend Jul 04 '13 at 22:50
  • yep that's the expected behaviour. `evaluate` will return a scalar value. try `var_dump($elements);` (I expect your query will return a string, rather than a node list) – hek2mgl Jul 04 '13 at 22:52
  • I see although I don't understand why the output to file ends up staying the same, all `N/A,`, shouldn't it work if `var_dump` prints out string from the xpath query? – AnchovyLegend Jul 04 '13 at 22:55
  • place a `continue;` after the `var_dump()` What do you see now? – hek2mgl Jul 04 '13 at 23:02
  • Interactive debugging is best done not on Stackoverflow but with a remote debugger, see http://xdebug.org/docs/remote – hakre Jul 04 '13 at 23:05
  • @hakre while `xdebug` is nice, of course, `var_dump()` does a good job as well. Especially when I have no access to the debug session of the OP – hek2mgl Jul 04 '13 at 23:07
  • @hek2mgl: That should be a non-issue, the OP is required to provide a self-contained, short example with the question that allows you to easily reproduce the issue. The OP has failed with this pre-condition. Get the question on-hold until this has been provided. Otherwise it's also not clear what the question is and it is of very low quality and should not be kept on site. – hakre Jul 04 '13 at 23:10
  • @hakre I understand what you mean and upvoted your comment above but I guess that the problem in this case is just the usage of `query()` in favour of `evaluate()` (and the code after it). In this case this could be answered without seeing the whole xml – hek2mgl Jul 04 '13 at 23:12
  • 1
    @AnchovyLegend No problem! :) It took me hours once too, as you.. I think this function is poorly documented.. – hek2mgl Jul 04 '13 at 23:14
  • The evalutate function is part of Xpath DOM 3 specs. Not that PHP DOMDocument [officially supports that feature](http://stackoverflow.com/a/17340953/367456) of DOMDocument, however it is modeled after those specs. You find them here: http://www.w3.org/TR/DOM-Level-3-XPath/ - The docs are pretty verbose. – hakre Jul 04 '13 at 23:15
  • will have a look there, thank you! (I think to remember that *you* were the person who once pointed me towards `evaluate()`,thx ;) – hek2mgl Jul 04 '13 at 23:17
  • Yes, could be :) And yes, string return values require the evaluate method. However it's documented with the query method that it returns a DOMNodeList therefore can't return strings. So the question would have been greatly improved if for that would have been specifically asked for. Now there is a title about google and some xpath "not working" with a really large fragment of code that is not speaking well for itself. :/ – hakre Jul 04 '13 at 23:20