0

Say i have a text like this:

This should also be extracted, <strong>text</strong>

I need the text only from the entire string, I have tried this:

r = r.replace(/<strong[\s\S]*?>[\s\S]*?<\/strong>/g, "$1"); but failed (strong is still there). Is there any proper way to do this?

Expected Result

This should also be extracted, text

Solution:

To target specific tag I used this:

r = r.replace(/<strong\b[^>]*>([^<>]*)<\/strong>/i, "**$1**")

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
user2002495
  • 2,126
  • 8
  • 31
  • 61

2 Answers2

3

To parse HTML, you need an HTML parser. See this answer for why.

If you just want to remove <strong> and </strong> from the text, you don't need parsing, but of course simplistic solutions tend to fail, which is why you need an HTML parser to parse HTML. Here's a simplistic solution that removes <strong> and </strong>:

str = str.replace(/<\/?strong>/g, "")

var yourString = "This should also be extracted, <strong>text</strong>";
yourString = yourString.replace(/<\/?strong>/g, "")
display(yourString);

function display(msg) {
  // Show a message, making sure any HTML tags show
  // as text
  var p = document.createElement('p');
  p.innerHTML = msg.replace(/&/g, "&amp;").replace(/</g, "&lt;");
  document.body.appendChild(p);
}

Back to parsing: In your case, you can easily do it with the browser's parser, if you're on a browser:

var yourString = "This should also be extracted, <strong>text</strong>";
var div = document.createElement('div');
div.innerHTML = yourString;
display(div.innerText || div.textContent);

function display(msg) {
  // Show a message, making sure any HTML tags show
  // as text
  var p = document.createElement('p');
  p.innerHTML = msg.replace(/&/g, "&amp;").replace(/</g, "&lt;");
  document.body.appendChild(p);
}

Most browsers provide innerText; Firefox provides textContent, which is why there's that || there.

In a non-browser environment, you'll want some kind of DOM library (there are lots of them).

Community
  • 1
  • 1
T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
2

You can do this

var r = "This should also be extracted, <strong>text</strong>";
r = r.replace(/<(.+?)>([^<]+)<\/\1>/,"$2");
console.log(r);

I have just included some strict regex. But if you want relaxed version, you can very well do

r = r.replace(/<.+?>/g,"");
Amit Joki
  • 58,320
  • 7
  • 77
  • 95