3

Let's say I have the following element TEXT in HTML:

<div id="TEXT">
  <p>First <strong>Line</strong></p>
  <p>Seond <em>Line</em></p>
</div>

How should one extract the raw text from this element, without HTML tags, but preserving the line breaks?


I know about the following two options but neither of them seems to be perfect:

  1. document.getElementById("TEXT").textContent
    • returns
      • First LineSecond Line
    • problem: ignores the line break that should be included between paragraphs
  2. document.getElementById("TEXT").innerText
    • returns
      • First Line Second Line
    • problem: is not part of W3C standard and is not guaranteed to work in all browsers
BartoNaz
  • 2,743
  • 2
  • 26
  • 42
  • 1
    http://caniuse.com/#feat=innertext has more support than textContent which is not complatible with IE8 - also this: http://stackoverflow.com/questions/21033887/preserving-newlines-when-using-text-or-textcontent-possible-alternatives and this http://perfectionkills.com/the-poor-misunderstood-innerText/ – mplungjan Jul 10 '16 at 11:59

2 Answers2

0

Here's a handy function for getting text contents of any element and it works well on all platforms, and yes, it preserves line breaks.

function text(e){
    var t = "";
    e = e.childNodes || e;
    for(var i = 0;i<e.length;i++){
        t+= e[i].nodeType !=1 ? e[i].nodeValue : text(e[i].childNodes);
    }
    return t;
}
ZenDD
  • 906
  • 1
  • 7
  • 16
0

You can check how jQuery does it. It uses sizzle js. Here is the function that you can use.

<div id="TEXT">
  <p>First <strong>Line</strong></p>
  <p>Seond <em>Line</em></p>
</div>
<script>
var getText = function( elem ) {
    var node,
        ret = "",
        i = 0,
        nodeType = elem.nodeType;

    if ( !nodeType ) {
        // If no nodeType, this is expected to be an array
        while ( (node = elem[i++]) ) {
            // Do not traverse comment nodes
            ret += getText( node );
        }
    } else if ( nodeType === 1 || nodeType === 9 || nodeType === 11 ) {
        // Use textContent for elements
        // innerText usage removed for consistency of new lines (jQuery #11153)
        if ( typeof elem.textContent === "string" ) {
            return elem.textContent;
        } else {
            // Traverse its children
            for ( elem = elem.firstChild; elem; elem = elem.nextSibling ) {
                ret += getText( elem );
            }
        }
    } else if ( nodeType === 3 || nodeType === 4 ) {
        return elem.nodeValue;
    }
    // Do not include comment or processing instruction nodes

    return ret;
};
console.log(getText(document.getElementById('TEXT')));
<script>
ARIF MAHMUD RANA
  • 5,026
  • 3
  • 31
  • 58