Javascript replace tag but preserve content

Question

Say i have a text like this:

This should also be extracted, text

I need the text only from the entire string, I have tried this:

r = r.replace(/<strong[\s\S]*?>[\s\S]*?<\/strong>/g, "$1"); but failed (strong is still there). Is there any proper way to do this?

Expected Result

This should also be extracted, text

Solution:

To target specific tag I used this:

r = r.replace(/<strong\b[^>]*>([^<>]*)<\/strong>/i, "**$1**")

score 3 · Answer 1 · edited May 23 '17 at 11:49

To parse HTML, you need an HTML parser. See this answer for why.

If you just want to remove  and  from the text, you don't need parsing, but of course simplistic solutions tend to fail, which is why you need an HTML parser to parse HTML. Here's a simplistic solution that removes  and :

str = str.replace(/<\/?strong>/g, "")

var yourString = "This should also be extracted, <strong>text</strong>";
yourString = yourString.replace(/<\/?strong>/g, "")
display(yourString);

function display(msg) {
  // Show a message, making sure any HTML tags show
  // as text
  var p = document.createElement('p');
  p.innerHTML = msg.replace(/&/g, "&amp;").replace(/</g, "&lt;");
  document.body.appendChild(p);
}

Back to parsing: In your case, you can easily do it with the browser's parser, if you're on a browser:

var yourString = "This should also be extracted, <strong>text</strong>";
var div = document.createElement('div');
div.innerHTML = yourString;
display(div.innerText || div.textContent);

function display(msg) {
  // Show a message, making sure any HTML tags show
  // as text
  var p = document.createElement('p');
  p.innerHTML = msg.replace(/&/g, "&amp;").replace(/</g, "&lt;");
  document.body.appendChild(p);
}

Most browsers provide innerText; Firefox provides textContent, which is why there's that || there.

In a non-browser environment, you'll want some kind of DOM library (there are lots of them).

I am not parsing the html, i just need the text without the tag shown in a text document later — user2002495, Oct 12 '14 at 11:43
@user2002495: To reliably get the content within HTML tags, you have to parse HTML. It's as simple as that. Attempts to use simplistic rules *will* fail. — T.J. Crowder, Oct 12 '14 at 11:46
Thanks for all the answer, in the end I solved it on my own, but all your answers seems enlightening — user2002495, Oct 12 '14 at 11:51
@user2002495: FWIW, for just `strong` and without attributes (based on your comments elsewhere), I did add a regex example. — T.J. Crowder, Oct 12 '14 at 11:55

Amit Joki · Accepted Answer · 2014-10-12T11:47:38.090

2

You can do this

var r = "This should also be extracted, <strong>text</strong>";
r = r.replace(/<(.+?)>([^<]+)<\/\1>/,"$2");
console.log(r);

I have just included some strict regex. But if you want relaxed version, you can very well do

r = r.replace(/<.+?>/g,"");

edited Oct 12 '14 at 11:47

answered Oct 12 '14 at 11:41

Amit Joki

58,320
7
77
95

thanks, is it possible to only regex only on strong tag with your code? – user2002495 Oct 12 '14 at 11:47
Thanks, see solution, I have able to target specific tag only – user2002495 Oct 12 '14 at 11:50
@user2002495 don't forget to accept any answer which helped you the most – Amit Joki Oct 12 '14 at 11:51
This does, of course, fail with `text` Hence needing to parse. – T.J. Crowder Oct 12 '14 at 11:52
I don't need to check for attributes, since basically I control the entire DOM elements itself in an uneditable iframe, I just need to be able to do what I want – user2002495 Oct 12 '14 at 11:52
@T.J.Crowder well, yeah.. won't answer any more of these :) – Amit Joki Oct 12 '14 at 11:53

Javascript replace tag but preserve content

2 Answers2