what's wrong in this html dom php code?

Question

I'm trying to do a code that will print the contents of all the elements with itemprop="price" from some link but it don't work, I can't figure out why, this is the code:

<?php
error_reporting(0);
ini_set('display_errors', 0);
$doc      = new DOMDocument();
$allscan  = array(
    'http://www.mobile54.co.il/30786',
    'http://www.mobile54.co.il/35873',
    'http://www.mobile54.co.il/34722'
);
$alllinks = array();
$html     = file_get_contents($allscan[0]);
$doc->loadHTML($html);
$href = $doc->getElementsByTagName('a');
for ($j = 0; $j < count($allscan); $j++) {
    $html = file_get_contents($allscan[$j]);
    $doc->loadHTML($html);
    $href = $doc->getElementsByTagName('a');
    for ($i = 0; $i < $href->length; $i++) {
        $link = $href->item($i)->getAttribute("href");
        $lin  = preg_replace('/\s+/', '', 'http://www.mobile54.co.il' . $link . "<br />");
        if (strpos($link, 'items/') && !strpos($link, '#techDetailsAName')) {
            if (!in_array($lin, $alllinks)) {
                $alllinks[] = $lin;
            }
        }
    }
}

for ($i = 0; $i < count($alllinks); $i++) {
    echo $alllinks[$i];
}
for ($i = 0; $i < count($alllinks); $i++) {
    $lin  = "$alllinks[$i]";
    $html = file_get_contents($lin);
    $doc->loadHTML('<?xml encoding="UTF-8"?>' . $html);
    $span = $doc->getElementsByTagName('span');
    for ($j = 0; $j < $span->length; $j++) {
        $attr = $span->item($j)->getAttribute('itemprop');
        if ($attr == "price") {
            echo $span->item($j)->textContent . "<br />";
        }
    }
}


?>

when I paste "someurl" insted of $lin it work but the other way doesn't. I've tried to do $html = file_get_contents($alllinks[$i]); but it didn't work, I don't know why.

miken32 · Accepted Answer · 2017-09-13T17:00:58.303

I think your problem is probably that you appended a <br /> to the end of your URL for some reason. But, there are a lot of opportunities to improve your code with use of XPath. (Note also that you can just pass a URL directly to the DomDocument object.)

First we pull all the <a> elements with matching attribute values. We get the URLs and then search them for elements with the exactly matching itemprop attribute, and get the text content of them.

<?php
$url = "http://www.mobile54.co.il/30786";
$prices = [];
$hrefs = [];
$combined = [];

$dom = new DomDocument;
libxml_use_internal_errors(true);
$dom->loadHtmlFile($url);
$xpath = new DomXPath($dom);
// get <a> elements with href containing items/ but not #techDetailsAName
$nodes = $xpath->query("//a[contains(@href, 'items/') and not(contains(@href, '#techDetailsAName'))]/@href");
foreach ($nodes as $node) {
    $hrefs[] = trim($node->value);
}

// now you have a list of URLs
foreach ($hrefs as $k=>&$href) {
    $href = "http://www.mobile54.co.il$href";
    $dom->loadHtmlFile($href);
    $xpath = new DomXPath($dom);
    // get any element with itemprop of price
    $nodes = $xpath->query("//*[@itemprop='price']");
    $prices[$k] = $nodes->item(0)->textContent;
}

// now you have $urls and $prices, combine them:
foreach ($hrefs as $k=>$v) {
    $combined[$k] = [$hrefs[$k], $prices[$k]];
}
print_r($combined);

first of all thanks a lot! it solved the problem, I didn't saw it. second, I don't really understand your code and I don't know what you did there... how does it better then the other? (faster?) — U.azar, Apr 07 '17 at 19:31
You can see that the code is shorter, but also it's simpler to read and it gives better performance. Learning XPath can be tricky but there are lots of references online. — miken32, Apr 07 '17 at 19:34

what's wrong in this html dom php code?

1 Answers1