1

Given a DOMDocument Object as a parameter, such as the below:

class Comparison {

    public function __construct($domDocument=null){
        $anchors = $domDocument->getElementsByTagName('a');
        if($anchors && 0 < count($anchors)){
            foreach($anchors as $anchor){
                $original = ''; // Not sure how to get this
                $ordered = $this->rearrangeAttributes($anchor);
                $difference = $this->diff($original,$ordered);
                echo 'Original Source: '.$original."\n";
                echo 'Ordered Source: '.$ordered."\n";
                echo 'Difference: '.$difference."\n\n";
            }
        }
    }

}

How do you get the original HTML string indicated by $original?

My current approach is from here: http://php.net/manual/en/class.domnode.php

Try to get the parent of the node in question, get the innerHTML, however given that a certain degree of alteration happens on original source code in the conversion, it doesn't look like a robust way to do it. Are there ways to do this in a more effective fashion? I can pass in the raw HTML as well, but want to avoid the rabbit hole if there's a known solution.

UPDATE: If you want the parent source (cleaned) and the original doesn't matter, then Marc B's linked file is very useful: How to return outer html of DOMDocument?

But if you want the original source, you can try getting the line number http://php.net/manual/en/domnode.getlineno.php although, it's not clear if that's the cleaned source code or the original raw source code. Insight welcome!

Community
  • 1
  • 1
MyStream
  • 2,533
  • 1
  • 16
  • 33
  • http://stackoverflow.com/questions/5404941/how-to-return-outer-html-of-domdocument – Marc B Jun 16 '15 at 14:16
  • @Mrc B : From what I could tell, the same function doesn't save the original source, but a modified (corrected) version of the source. Perhaps, I'm wrong? – MyStream Jun 16 '15 at 14:20
  • yeah, it'll be a post-parsing version of the html, e.g. with all syntax warts fixed up. I don't think the DOM object stores the raw original html anywhere. you could try a `var_dump($domDocument)` and dig around, though. – Marc B Jun 16 '15 at 14:21
  • That's what I was thinking as well. I can't seem to see any reference back to the original source line or source string from which the element or node was produced. It appears to be one-way only. I can get line number from the source, which appears to be the most useful reference available as a starting point for digging around, but I'm not sure if that's a normalised source or raw source. – MyStream Jun 16 '15 at 14:23

0 Answers0