To parse HTML, you need an HTML parser. See this answer for why.
If you just want to remove <strong>
and </strong>
from the text, you don't need parsing, but of course simplistic solutions tend to fail, which is why you need an HTML parser to parse HTML. Here's a simplistic solution that removes <strong>
and </strong>
:
str = str.replace(/<\/?strong>/g, "")
var yourString = "This should also be extracted, <strong>text</strong>";
yourString = yourString.replace(/<\/?strong>/g, "")
display(yourString);
function display(msg) {
// Show a message, making sure any HTML tags show
// as text
var p = document.createElement('p');
p.innerHTML = msg.replace(/&/g, "&").replace(/</g, "<");
document.body.appendChild(p);
}
Back to parsing: In your case, you can easily do it with the browser's parser, if you're on a browser:
var yourString = "This should also be extracted, <strong>text</strong>";
var div = document.createElement('div');
div.innerHTML = yourString;
display(div.innerText || div.textContent);
function display(msg) {
// Show a message, making sure any HTML tags show
// as text
var p = document.createElement('p');
p.innerHTML = msg.replace(/&/g, "&").replace(/</g, "<");
document.body.appendChild(p);
}
Most browsers provide innerText
; Firefox provides textContent
, which is why there's that ||
there.
In a non-browser environment, you'll want some kind of DOM library (there are lots of them).