How can I get back to the original DOM after being affected by javascript

Question

Imagine I have a loaded HTML page which has been already affected by javascript adding/deleting dynamic elements or new classes/attributes/id to elements while initializing(e.g: original source code [html] tag has no classes, after javascript loads [html] tag has class="no-responsive full-with"). Imagine after that I add/amend some id values manually (through my app). And imagine I need to be able to save in database the original source code (without any amends) but with the id attributes I added manually.

Basically I need to add a given id attribute to an element within the source code of an HTML, loaded through PHP.

Do you guys have any idea of how to do such a thing?

http://stackoverflow.com/questions/4397577/get-original-innerhtml-source-without-the-javascript-generated-contents — pawel, Jul 10 '14 at 08:00
@pawel—I doubt that will work in a non–trivial page, unless it's been designed to be treated that way. — RobG, Jul 10 '14 at 08:06
you can ajax the orig html from location.href, build an xpath to your changed element, find that xpath in the domdocument you ajax'd, old.parentNode.insertBefore(new,old);old.remove(); serialize dom to string, save. google for a getXpath(elm) function... — dandavis, Jul 10 '14 at 08:16
@dandavis would the xpath work with a fragment too? I am using jQuery. — human, Jul 10 '14 at 09:51
@human: i don't think fragments have all the dom methods that documents do, so it might be hard to apply an xpath to a fragment using existing code that depends on stuff like getElementsByTagName() being available. — dandavis, Jul 10 '14 at 19:28

T.J. Crowder · Accepted Answer · 2014-07-10T12:32:23.547

There's no simple solution here. The exact nature of the complex solution will be determined by your full set of requirements.

Updated Concept

You've said that in addition to changing things, you'll also be adding elements and removing them. So you can't relate the changed elements to the originals purely structurally (e.g., by child index), since those may change.

So here's how I'd probably approach it:

Immediately after the page is loaded, before any modifications are made, give every element in the a unique identifier. This is really easy with jQuery (and not particularly hard without it):

var uniqueId = 0;
$("*").attr("data-uid", function() {
    return ++uniqueId;
});

Now every element on the page has a unique identifier. Next, copy the DOM and get a jQuery wrapper for it:

var clone = $("html").clone();

Now you have a reliable way to relate elements in the DOM with their original versions (our clones), via the unique IDs. Allow the user to make changes.

When you're ready to find out what changes were made, you do this:

// Look for changes
clone.find("*").addBack().each(function() {
    // Get this clone's unique identifier
    var uid = $(this).attr("data-uid");

    // Get the real element corresponding to it, if it's
    // still there
    var elm = $("[data-uid=" + uid + "]")[0];

    // Look for changes
    if (!elm) {
        // This element was removed
    }
    else {
        if (elm.id !== this.id) {
            // This element's id changed
        }
        if (elm.className !== this.className) {
            // This element's className changed
        }
        // ...and so on...
    }
});

That will tell you about removed and changed elements. If you also want to find added elements, just do this:

var added = $(":not([data-uid])");

...since they won't have the attribute.

You can use the information in clone to reconstruct the original DOM's string:

clone.find("[data-uid]").addBack().removeAttr("data-uid");
var stringToSend = clone[0].outerHTML;

(outerHTML is supported by any vaguely modern browser, the latest to add it was Firefox in v11.)

...and of course the information above to record changes.

Live proof of concept

HTML:

<p class="content">Some content</p>
<p class="content">Some further content</p>
<p>Final content</p>
<input type="button" id="makeChange" value="Make Change">
<input type="button" id="seeResults" value="See Results">

JavaScript:

// Probably unnecessary, but I wanted a scoping
// function anyway, so we'll give the parser time
// to completely finish up.
setTimeout(function() {
    // Assign unique identifer to every element
    var uniqueId = 0;
    $("*").attr("data-uid", function() {
        return ++uniqueId;
    });

    // Clone the whole thing, get a jQuery object for it
    var clone = $("html").clone();

    // Allow changes
    $("#makeChange").click(function() {
        this.disabled = true;
        $("p:eq(1)").attr("id", "p1");
        $("p:eq(2)").addClass("foo");
        alert("Change made, set an id on one element and added a class to another");
    });

    // See results
    $("#seeResults").click(function() {
        this.disabled = true;

        // Look for changes
        clone.find("*").addBack().each(function() {
            // Get this clone's unique identifier
            var uid = $(this).attr("data-uid");

            // Get the real element corresponding to it, if it's
            // still there
            var elm = $("[data-uid=" + uid + "]")[0];

            // Look for changes
            if (!elm) {
                display("Element with uid " + uid + ": Was removed");
            }
            else {
                if (elm.id !== this.id) {
                    display("Element with uid " + uid + ": <code>id</code> changed, now '" + elm.id + "', was '" + this.id + "'");
                }
                if (elm.className !== this.className) {
                    display("Element with uid " + uid + ": <code>className</code> changed, now '" + elm.className + "', was '" + this.className + "'");
                }
            }
        });
    });

    function display(msg) {
        $("<p>").html(String(msg)).appendTo(document.body);
    }
}, 0);

Earlier Answer

Assuming the server gives you the same text for the page every time it's asked, you can get the unaltered text client-side via ajax. That leaves us with the question of how to apply the id attributes to it.

If you need the original contents but not necessarily identical source (e.g., it's okay if tag names change case [div might become DIV], or attributes gain/lose quotes around them), you could use the source from the server (retrieved via ajax) to populate a document fragment, and apply the id values to the fragment at the same time you apply them to the main document. Then send the source of the fragment to the server.

Populating a fragment with the full HTML from your server is not quite as easy as it should be. Assuming html doesn't have any classes or anything on it, then:

var frag, html, prefix, suffix;
frag = document.createDocumentFragment();
html = document.createElement("html");
frag.appendChild(html);
prefix = stringFromServer..match(/(^.*<html[^>]*>)/);
prefix = prefix ? prefix[1] : "<!doctype html><html>";
suffix = stringFromServer.match(/(<\/html>\s*$)/);
suffix = suffix ? suffix[1] : "</html>";
html.innerHTML = stringFromServer.replace(/^.*<html[^>]*>/, '').replace(/<\/html>\s*$/, '');

There, we take the server's string, grab the outermost HTML parts (or use defaults), and then assign the inner HTML to an html element inside a fragment (although the more I think about it, the less I see the need for a fragment at all — you can probably just drop the fragment part). (Side Note: The part of the regular expressions above that identifies the start tag for the html element, <html[^>]*>, is one of those "good enough" things. It isn't perfect, and in particular will fail if you have a > inside a quoted attribute value, like this: <html data-foo="I have a > in me">, which is perfectly valid. Working around that requires much harder parsing, so I've assumed above that you don't do it, as it's fairly unusual.)

Then you can find elements within it via html.querySelector and html.querySelectorAll in order to apply your id attributes to them. Forming the relevant selectors will be great fun, probably a lot of positional stuff.

When you're done, getting back the HTML to send to the server looks like this:

var stringToSend = prefix + html.innerHTML + suffix;

That solves part of the problem which is great, creating a dynamic object and storing the original source code, but I cannot assume the html tag won't have any class and the trickiest part is to find the elements within the original source code if ids and class might be slightly different. @dandavis commented something about xpath which might be one solution for it. — human, Jul 10 '14 at 09:46
@human: Yes, xpath may be an option. About having a class on ``, the above handles that (with the `prefix`). If the *structure* of the document won't change (you're only adding `id`s and changing classes), you can always get to the element by figuring out what the path is from the root: E.g., "Okay, the user is adding `id="foo"` to the third child of the second child of the 17th child of the first child of the `body` element." Those indexes are easy to determine, and feed into `:nth-child` selectors for `querySelector`. Do you have to handle the case where elements are added or removed? — T.J. Crowder, Jul 10 '14 at 09:57
There may be cases in which elements will be added or removed as users will provide their own html code. — human, Jul 10 '14 at 10:49
@human: I think I've come up with something for you, see the update. — T.J. Crowder, Jul 10 '14 at 12:34

How can I get back to the original DOM after being affected by javascript

1 Answers1