0

Using this code I want to count the number of elements (dt) with class "level3" in certain node:

include_once('simple_html_dom.php');
ini_set("memory_limit", "-1");
ini_set('max_execution_time', 1200);

function parseInit($url) {
  $ch = curl_init();
  $timeout = 0;
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);     
  curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2); 
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
  $data = curl_exec($ch);
  curl_close($ch);
  return $data;
}

$data = parseInit("https://www.smile-dental.de/index.php");
$html = new simple_html_dom();
$html = $html->load($data);
$struct = $html->find("dt.level1", 0)->next_sibling()->find("dt.level2", 0)->next_sibling()->find("dt.level3");
echo count($struct);
$html->clear();  
unset($html);

But as a result I've got such problem. Real result should be 2, but I get 53 (total count of the DT elements with class "level3" into the first DT node with class "level1" ). Could you help me and explain what the problem is?

Thanks in advance!

---EDIT--- Generally, I want to create hierarchical structure of links (of left navigation bar). I wrote such function. But it works wrong, and maybe because of situation which was written by me above. But maybe there also other problems besides this one in the code.

function get_links($struct) {
    static $iter = 1;
    $nav_left_links = $struct->find("dt.level".$iter);
    echo "<ul>";   
    foreach ($nav_left_links as $links) {
        echo "<li>".$links->find("a", 0)->href;
        echo str_pad('',4096)."\n";
        ob_flush();
        flush();
        usleep(500000);
        $iter++;
        if ($links->next_sibling() && count($links->next_sibling()->find("dt")) > 0) {
            get_links($links->next_sibling());
        } else {
            $iter--;
            if ($key == count($nav_left_links)) {
                break;
            } else {
                continue;   
            }
        }
        echo "</li>";  
    }
    echo "</ul>";
    $iter--;
}

$data = parseInit("https://www.smile-dental.de/index.php");
$html = new simple_html_dom();
$html = $html->load($data);
$struct = $html->find(".mod_vertical_dropmenu_142_inner", 0);
get_links($struct);
$html->clear();  
unset($html); 

Or maybe if somebody knows how to rewrite this code without PHP Simple HTML DOM, using classic methods for parsing, I would be very grateful.

jekahm
  • 149
  • 1
  • 9

1 Answers1

0

Unfortunately, it looks like you have uncovered a bug. I did some experiments, and even after correcting the validation errors, simple-html-dom wasn't able to traverse the dl, dt, and dd elements properly. I did get it to work when I used a regex to convert all the dl elements to ul, and the dd and dt elements to li, though:

result of $html->find("li.level1", 1)->find("li.level2", 1)->find("li.level3");

<li class="level3 off-nav-321-8120 notparent first"><span class="outer"> <span class="inner"> <a href="/index.php?option=com_virtuemart&amp;view=productdetails&amp;virtuemart_category_id=321&amp;virtuemart_product_id=8120"><span>Pro-Seal Versiegeler</span></a> </span> </span></li>
<li class="level3 off-nav-321-8120 notparent first"></li>
<li class="level3 off-nav-321-8122 notparent last"><span class="outer"> <span class="inner"> <a href="/index.php?option=com_virtuemart&amp;view=productdetails&amp;virtuemart_category_id=321&amp;virtuemart_product_id=8122"><span>Pro-Seal L.E.D. Versiegeler</span></a> </span> </span></li>
<li class="level3 off-nav-321-8122 notparent last"></li>
i alarmed alien
  • 9,412
  • 3
  • 27
  • 40
  • I've edited the code and question above to explain what I need to get exactly. – jekahm Aug 27 '14 at 10:59
  • Since the dl / dt / dd parsing doesn't appear to work properly, why don't you convert all the dl, dt, and dd tags to divs, and add the old tag type to the class (e.g. class="level3dt ...", class="level1dl ...")? You'll still be able to traverse and filter the elements that way. – i alarmed alien Aug 27 '14 at 11:38
  • Could you help me to do this one? Because I'm not still good in using of regexp? Thanks in advance! – jekahm Aug 27 '14 at 12:51
  • You could use these regexps to replace the `dt`, `dd`, and `dl` tags with `div`s: `$data = preg_replace('/<(d[ldt])( |>)/smi', '
    /smi', '
    ', $data);`
    – i alarmed alien Aug 27 '14 at 13:13
  • But now I have another problem. For example, when I'm trying to use such code ->find("div['data-type'=dt].level2");, I get not elements with class LEVEL1 and data-type=DT, but all elements with this class name (data-type=DL and data-type=DD). Maybe it's because of situation, that this parser can't determine DATA attributes? – jekahm Aug 27 '14 at 14:33
  • The syntax you're using might be wrong: it should be `find("div.level2[data-type=dt]")` to get elements with class `level2` and data-type `dt`. – i alarmed alien Aug 27 '14 at 14:56
  • TO "i alarmed alien": Sorry if I ask a lot of questions. But I wanted to ask, maybe you know what problems my code above has (for getting hierarchical structure of links)? Because it works wrong and gives inaccurate result. Thanks in advance! – jekahm Aug 27 '14 at 16:16
  • Good, glad that you got it sorted! Please click the tick by the answer to show that it worked. :) – i alarmed alien Aug 27 '14 at 16:19
  • Sorry, that in advance wrote about success result. There are some problems occurred again. If I use such syntax: find("div.level2[data-type=dd]"), then I get ALL elements with data attributes DT, DD, DL and class name LEVEL2. If I use another syntax: find("div.level2[data-type=dd]", then I get all elements with data attribute DD, but with class names LEVEL1, LEVEL2 and LEVEL3 etc.. – jekahm Aug 28 '14 at 08:29
  • You have used the same syntax in both the queries in your comment! – i alarmed alien Aug 28 '14 at 09:35
  • Sorry, I mean: - find("div[data-type=dd].level2") => ALL elements with data attributes DT, DD, DL and class name LEVEL2-------------- - find("div.level2[data-type=dd]") => All elements with data attribute DD, but with class names LEVEL1, LEVEL2 and LEVEL3 etc. – jekahm Aug 28 '14 at 09:46
  • Will you create a new question about this? It's difficult to answer properly in the comments section. :\ – i alarmed alien Aug 28 '14 at 10:02
  • I've created. Link: [link]http://stackoverflow.com/questions/25546733/problems-with-multiple-attributes-while-using-php-simple-html-dom – jekahm Aug 28 '14 at 10:38