How to get the pure text without HTML element using JavaScript?

Question

I have the 1 button and some text in my HTML like the following:

function get_content(){
   // I don't know how to do in here!!!
}

<input type="button" onclick="get_content()" value="Get Content"/>
<p id='txt'>
<span class="A">I am</span>
<span class="B">working in </span>
<span class="C">ABC company.</span>
</p>

When the user clicks the button, the content in the  will become the follow expected result:

<p id='txt'>
// All the HTML element within the <p> will be disappear
I am working in ABC company.
</p>

Can anyone help me how to write the JavaScript function?

Thank you.

Does this answer your question? [Strip HTML from Text JavaScript](https://stackoverflow.com/questions/822452/strip-html-from-text-javascript) — KyleMit, Jan 11 '20 at 20:09

score 252 · Answer 1 · edited May 24 '17 at 11:58

252

You can use this:

var element = document.getElementById('txt');
var text = element.innerText || element.textContent;
element.innerHTML = text;

Depending on what you need, you can use either element.innerText or element.textContent. They differ in many ways. innerText tries to approximate what would happen if you would select what you see (rendered html) and copy it to the clipboard, while textContent sort of just strips the html tags and gives you what's left.

innerText also has compatability with old IE browsers (came from there).

edited May 24 '17 at 11:58

Matthias

3,160
2
24
38

answered Jul 19 '11 at 07:58

Gabi Purcaru

30,940
9
79
95

3

+1 - Was looking for some high performance `text` method since it gets done a lot in a loop. jQuery was not performant enough, but this was very fast. Worked in IE8+, chrome, ff. Perfect. – Travis J Apr 19 '13 at 19:48
2

On old IE, `el.textContent` will be `undefined` and `el.innerText` might be `""`. But `"" || undefined` is `undefined`. Using `el.innerText || el.textContent || ''` may be better. – Oriol Mar 12 '15 at 17:08
3

innerText doesn't return hidden text and content of script/style tags while textContent does. If you're on a version of IE which supports textContent, it might be preferable to use it first, so `el.textContent || el.innerText || ""`. – Domino May 24 '15 at 16:21
4

Just a note for anyone reading this answer in present day, more than six years after this answer, these days you can just use `var text = element.textContent;`; unless for some ungodly reason you still have to support [IE8 or below](https://developer.mozilla.org/en-US/docs/Web/API/Node/textContent#Browser_compatibility). – Useless Code Nov 21 '17 at 15:33
`el.innerText` is roughly the same as `el.textContent.replace(/\W+/g, ' ')`. They are not the same. – Polv Dec 06 '19 at 10:00
is there a way to do this in node? – chovy Oct 30 '20 at 08:10
there is a rich html-to-text library for Node – wnm3 Nov 04 '21 at 20:24

score 84 · Accepted Answer · edited Nov 10 '20 at 17:24

84

[2017-07-25] since this continues to be the accepted answer, despite being a very hacky solution, I'm incorporating Gabi's code into it, leaving my own to serve as a bad example.

// my hacky approach:
function get_content() {
  var html = document.getElementById("txt").innerHTML;
  document.getElementById("txt").innerHTML = html.replace(/<[^>]*>/g, "");
}
// Gabi's elegant approach, but eliminating one unnecessary line of code:
function gabi_content() {
  var element = document.getElementById('txt');
  element.innerHTML = element.innerText || element.textContent;
}
// and exploiting the fact that IDs pollute the window namespace:
function txt_content() {
  txt.innerHTML = txt.innerText || txt.textContent;
}

.A {
  background: blue;
}

.B {
  font-style: italic;
}

.C {
  font-weight: bold;
}

<input type="button" onclick="get_content()" value="Get Content (bad)" />
<input type="button" onclick="gabi_content()" value="Get Content (good)" />
<input type="button" onclick="txt_content()" value="Get Content (shortest)" />
<p id='txt'>
  <span class="A">I am</span>
  <span class="B">working in </span>
  <span class="C">ABC company.</span>
</p>

edited Nov 10 '20 at 17:24

Alessio Cantarella

5,077
3
27
34

answered Jul 19 '11 at 08:08

jcomeau_ictx

37,688
6
92
107

3

Bad because hacky and slow. Is there even a guarantee that the rendered text itself must never contain tags? – Domi Jan 09 '14 at 14:19
1

no, there is no such guarantee. I gave a disclaimer when I posted. it apparently served the purpose of the OP. – jcomeau_ictx Jan 09 '14 at 17:12
5

Trying to parse HTML with regular expressions is really dangerous --- it's practically impossible (I suspect it may be _theoretically_ impossible) to get right. There's too many edge cases and then your code blows up when faced with strange input, which can frequently be exploited to do XSS. – David Given Feb 04 '15 at 22:37
2

my guess as to why it was accepted: it's a complete answer, which can be immediately cut-and-pasted as is into an html file and tested with a browser. I never said it was a *good* answer. I posted after seeing all the *good* answers were there, and not accepted, and figured the OP needed a little handholding. it still is good enough for any application for which the HTML source is already known not to contain unbalanced angle brackets. – jcomeau_ictx Aug 29 '16 at 23:39

score 25 · Answer 3 · answered Jul 19 '11 at 08:07

25

If you can use jquery then its simple

$("#txt").text()

answered Jul 19 '11 at 08:07

Sarath

9,030
11
51
84

8

I just have to say, look at all the pure JS answers and then look at this one. This is the second most important reason why I use jQuery (i.e., it simplifies tasks, reduces my workload, and increases readability). The first most important reason (to me) is because it handles many cross-compatibility issues, I might otherwise not even be aware of (like using jQuery to adjust opacity, so that I don't have to write a separate line just for IE8 to target the `filter` property. I know that pure JS is technically more efficient when it comes to speed, but that hardly matters anymore in most normal.. – VoidKing Oct 01 '13 at 14:22
13

pure js one liner equivalent: `document.querySelector("#txt").innerText;` People include the entire jQuery library far too often when their only need is a couple of lines of code. It's bad practice. – Levi Mar 11 '18 at 13:36

score 11 · Answer 4 · answered Sep 05 '14 at 19:32

This answer will work to get just the text for any HTML element.

This first parameter "node" is the element to get the text from. The second parameter is optional and if true will add a space between the text within elements if no space would otherwise exist there.

function getTextFromNode(node, addSpaces) {
    var i, result, text, child;
    result = '';
    for (i = 0; i < node.childNodes.length; i++) {
        child = node.childNodes[i];
        text = null;
        if (child.nodeType === 1) {
            text = getTextFromNode(child, addSpaces);
        } else if (child.nodeType === 3) {
            text = child.nodeValue;
        }
        if (text) {
            if (addSpaces && /\S$/.test(result) && /^\S/.test(text)) text = ' ' + text;
            result += text;
        }
    }
    return result;
}

score 2 · Answer 5 · edited May 23 '17 at 12:10

Depending on what you need, you can use either element.innerText or element.textContent. They differ in many ways. innerText tries to approximate what would happen if you would select what you see (rendered html) and copy it to the clipboard, while textContent sort of just strips the html tags and gives you what's left.

innerText is not just used for IE anymore, and it is supported in all major browsers. Of course, unlike textContent, it has compatability with old IE browsers (since they came up with it).

Complete example (from Gabi's answer):

var element = document.getElementById('txt');
var text = element.innerText || element.textContent; // or element.textContent || element.innerText
element.innerHTML = text;

score 2 · Answer 6 · answered Feb 08 '19 at 17:59

This works for me compiled based on what was said here with a more modern standard. This works best for multiple looks up.

let element = document.querySelectorAll('.myClass')
  element.forEach(item => {
    console.log(item.innerHTML = item.innerText || item.textContent)
  })

score 1 · Answer 7 · answered Jul 19 '11 at 08:00

That should work:

function get_content(){
   var p = document.getElementById("txt");
   var spans = p.getElementsByTagName("span");
   var text = '';
   for (var i = 0; i < spans.length; i++){
       text += spans[i].innerHTML;
   }

   p.innerHTML = text;
}

Try this fiddle: http://jsfiddle.net/7gnyc/2/

score 1 · Answer 8 · answered Jul 19 '11 at 08:00

function get_content(){
 var returnInnerHTML = document.getElementById('A').innerHTML + document.getElementById('B').innerHTML + document.getElementById('A').innerHTML;
 document.getElementById('txt').innerHTML = returnInnerHTML;
}

That should do it.

Kamil Kiełczewski · Answer 9 · 2019-08-19T21:01:09.120

0

Try (short version of Gabi answer idea)

function get_content() {
   txt.innerHTML = txt.textContent;
}

function get_content() {
   txt.innerHTML = txt.textContent ;
}

span { background: #fbb}

<input type="button" onclick="get_content()" value="Get Content"/>
<p id='txt'>
<span class="A">I am</span>
<span class="B">working in </span>
<span class="C">ABC company.</span>
</p>

edited Aug 19 '19 at 21:01

answered Aug 19 '19 at 20:54

Kamil Kiełczewski

85,173
29
368
345

score 0 · Answer 10 · answered Aug 12 '20 at 19:13

You want to change the I am working in ABC company. to I am working in ABC company.. These are the same strings, so I don't see a reason to, but you can accomplish this by using the JavaScript innerHTML or textContent.

element.innerHTML is a property that defines the HTML inside an element. If you type element.innerHTML = "This is bold, it'll make the text "This is bold" bold text.

element.textContent, on the other hand, sets the text in an element. If you use element.textContent = "This is bold, The text "This is bold" will not be bold. The user will literally see the text "This is bold

In your case, you can use either one. I'll use .textContent. The code to change the  element is below.

function get_content(){
   document.getElementById("txt").textContent = "I am working in ABC company.";
}

<input type="button" onclick="get_content()" value="Get Content"/>
<p id='txt'>
<span class="A">I am</span>
<span class="B">working in </span>
<span class="C">ABC company.</span>
</p>

This, unfortunately, will not change it because it'll change it to the same exact text. You can chance that by changing the string "I am working in ABC company." to something else.

I think you misunderstood. In John's question, the text: "I am working in ABC company." is just an example, he doesn't want to set the content of the
to a "literal string". He doesn't clearly "explicitly" state what he wants, but if you carefully read the question it's clear what he wants. First, he wants a function that will *get* the contents of the
: `function get_content()`. and second, he indicates in a `Note/Comment`, that "All the HTML element within the
will be disappear". So what he wants, is to get the content of the
, so something like `InnerHTML`, --> (continued) — Kevin Fegan, Aug 30 '21 at 07:03
except he wants all the HTML tags within the content of
to be removed. So in the example case, he wants to remove all the and , and return only the text: "I am working in ABC company.", but in a generalized way so it returns whatever text is actually in
. So something like: `var p=document.getElementById("txt"); p.innerHTML=p.textContent;`. — Kevin Fegan, Aug 30 '21 at 07:13

How to get the pure text without HTML element using JavaScript?

10 Answers10

Linked

Related