0

I want do a web crawler news. I want load content from link http://vnexpress.net/tin-tuc/ban-doc-viet/xa-hoi/chay-xe-may-theo-taxi-moi-biet-bi-chem-60-000-dong-2865724.html and i want get all content div with class fck_detail and keep original tag from it . How to do this ?

    <div class="fck_detail">
    <p class="Normal" style="text-align:justify;">Some texts</p>
    <p class="Normal" style="text-align:justify;">some texts</p>
    <p class="Normal" style="text-align:justify;">Some texts</p>
    <p class="Normal" style="text-align:justify;">Some texts</p>
    </div>

I tried but not success

    $doc = new DOMDocument();
    $doc->loadHTMLFile("http://example.com/some.html");
    $selector = new DOMXpath($doc);   
    $node = $selector->query('//div[@class="fck_detail"]')->item(0);
    echo trim($node->nodeValue);

The above code gives me the plain text stripped from all HTML only. But I want to keep the HTML.

Gordon
  • 312,688
  • 75
  • 539
  • 559
  • I would recommend http://simplehtmldom.sourceforge.net/, it is very easy and looks like jQuery – guiligan Aug 16 '13 at 21:30
  • 4
    @guiligan it's also slow and memory hungry. Besides, if you want something that really looks like jQuery, you have to use phpQuery. Or even better install ext/v8 and then run jQuery in PHP because everyone knows you can never have enough jQuery … not! – Gordon Aug 16 '13 at 21:38
  • In case you want the outerHTML instead of the innerHTML see http://stackoverflow.com/questions/5404941/php-domdocument-outerhtml-for-element – Gordon Aug 17 '13 at 06:47

0 Answers0