2

I want to have a Javascript function that removes every text from a website. The background is that in order to compare the appearance of the rendered DOM in difference browsers, I need to eliminate obvious differences before. As font rendering is a known difference, I want to remove every text. The solutions I found were always like this:

if(start.nodeType === Node.TEXT_NODE) 
{
    start.parentNode.removeChild(start);
}

But this only removes pure text nodes. I also want to find constructs like:

 <div>
        <p>
             <em>28.11.2014</em>
             <img></img>
                Testtext
             <span>
                <i>Testtext</i>
                Testtext
             </span>
        </p>
  </div>

Where the element containing text also contains children like or . That way, the element is not recognized as a text node.

So I basically want to turn the above DOM into this:

 <div>
        <p>
             <em></em>
             <img></img>
             <span>
                <i></i>
             </span>
        </p>
  </div>
Schnodderbalken
  • 3,257
  • 4
  • 34
  • 60

3 Answers3

2

You can try something like this.
Demo

HTML:

<div id="startFrom">
    <p>
        <em>28.11.2014</em>
            <img></img>
            Testtext
        <span>
            <i>Testtext</i>
            Testtext
        </span>
    </p>
</div>  

JavaScript:

var startFrom = document.getElementById("startFrom");

function traverseDom(node) {
    node = node.firstChild;
    while (node) {
        if (node.nodeType === 3) {
            node.data = "";
        }
        traverseDom(node);
        node = node.nextSibling;
    }
}

traverseDom(startFrom);
console.log(startFrom);
Givi
  • 1,674
  • 2
  • 20
  • 35
2

With Jquery.. DEMO

$('selecter').find("*").contents().filter(function() {
    return this.nodeType == 3;
}).remove();
Sampath Liyanage
  • 4,776
  • 2
  • 28
  • 40
  • 1
    Also a good solution. I wonder if it's as quick as the plain javascript solution. One thing about this: you have to take care of iframes when you select the whole DOM. Otherwise you will run into this: DOMException: Failed to read the 'contentDocument' property from 'HTMLIFrameElement': Blocked a frame with origin "http://www.example.com" from accessing a cross-origin frame. – Schnodderbalken Nov 30 '14 at 19:37
1

This code below is roughly checked, but you can try to put it in an external .js file and execute it from your document at onload

function cleantxt()
{
    var htmlsrc = document.documentElement.outerHTML;
    var htmlnew = '';
    var istag = false;
    for(i=0; i<htmlsrc.length; i++) {
        if(htmlsrc.charAt(i)=='<') {
            istag = true;
            htmlnew = htmlnew + htmlsrc.charAt(i);
        }
        else if(htmlsrc.charAt(i)=='>') {
            istag = false;
            htmlnew = htmlnew + htmlsrc.charAt(i);
        }
        else if(istag) {
            htmlnew = htmlnew + htmlsrc.charAt(i);
        }
    }
    document.getElementsByTagName("html")[0].innerHTML = htmlnew + 'Cleaned'; // just a signature to see it works 
}
Timothy Ha
  • 399
  • 3
  • 7