1

I am trying to read a website's content but i have a problem i want to get images, links these elements but i want to get elements them selves not the element content for instance i want to get that: i want to get that entire element.

How can i do this..

<?php

    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL, "http://www.link.com");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

    $output = curl_exec($ch);

    $dom = new DOMDocument;
    @$dom->loadHTML($output);

    $items = $dom->getElementsByTagName('a');

    for($i = 0; $i < $items->length; $i++) {
        echo $items->item($i)->nodeValue . "<br />";
    }

    curl_close($ch);;
?>
Sterling Duchess
  • 1,970
  • 16
  • 51
  • 91
  • Not to stray from your question but I suggest using PHP Simple HTML DOM Parser. It makes coding like this much simpler. http://simplehtmldom.sourceforge.net/manual.htm – Norse May 10 '12 at 00:39
  • I know about it and it would make my life easier but i think part of the code could be selled and i dont know if i can ship the libary with it. – Sterling Duchess May 10 '12 at 00:41
  • I need to know how to get an entire element – Sterling Duchess May 10 '12 at 00:42

2 Answers2

4

You appear to be asking for the serialized html of a DOMElement? E.g. you want a string containing <a href="http://example.org">link text</a>? (Please make your question clearer.)

$url = 'http://example.com';
$dom = new DOMDocument();
$dom->loadHTMLFile($url);

$anchors = $dom->getElementsByTagName('a');

foreach ($anchors as $a) {
    // Best solution, but only works with PHP >= 5.3.6
    $htmlstring = $dom->saveHTML($a);

    // Otherwise you need to serialize to XML and then fix the self-closing elements
    $htmlstring = saveHTMLFragment($a);
    echo $htmlstring, "\n";
}


function saveHTMLFragment(DOMElement $e) {
    $selfclosingelements = array('></area>', '></base>', '></basefont>',
        '></br>', '></col>', '></frame>', '></hr>', '></img>', '></input>',
        '></isindex>', '></link>', '></meta>', '></param>', '></source>',
    );
    // This is not 100% reliable because it may output namespace declarations.
    // But otherwise it is extra-paranoid to work down to at least PHP 5.1
    $html = $e->ownerDocument->saveXML($e, LIBXML_NOEMPTYTAG);
    // in case any empty elements are expanded, collapse them again:
    $html = str_ireplace($selfclosingelements, '>', $html);
    return $html;
}

However, note that what you are doing is dangerous because it could potentially mix encodings. It is better to have your output as another DOMDocument and use importNode() to copy the nodes you want. Alternatively, use an XSL stylesheet.

Francis Avila
  • 31,233
  • 6
  • 58
  • 96
0

I'm assuming you just copy-pasted some example code and didn't bother trying to learn how it actually works...

Anyway, the ->nodeValue part takes the element and returns the text content (because the element has a single text node child - if it had anything else, I don't know what nodeValue would give).

So, just remove the ->nodeValue and you have your element.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • Thats the thing i cant print it out. Says its not a string then – Sterling Duchess May 10 '12 at 00:44
  • Object of class DOMElement could not be converted to string in – Sterling Duchess May 10 '12 at 00:44
  • You want the element, `DOMElement` is the element. It's not a string so I'm not sure what you expect it to print. Edit your question with example desired output so we don't have to guess what you are trying to say. – Francis Avila May 10 '12 at 00:53
  • Well i am reading the following right now i want to print that out to show the image so i read the img element and its url and if i echo it out it should show the image. – Sterling Duchess May 10 '12 at 00:58
  • In that case, assuming you have PHP 5.3.6 or newer, use `$dom->saveHTML($items->item($i));` ([docs](http://php.net/manual/en/domdocument.savehtml.php)) – Niet the Dark Absol May 10 '12 at 01:00