1

I have some HTML as a string

var str= "<p><br/></p>"

How do I strip the p tags from this string using JS. here is what I have tried so far:

str.replace(/<p[^>]*>(?:\s|&nbsp;)*<\/p>/, "") // o/p: <p><br></p>'
str.replace("/<p[^>]*><\\/p[^>]*>/", "")// o/p: <p><br></p>'
str.replace(/<p><br><\/p>/g, "")// o/p: <p><br></p>'

all of them return me same str as above, expected o/p is: str should be "" what im doing wrong here?

Thanks

Rikku
  • 428
  • 6
  • 14
user1234
  • 3,000
  • 4
  • 50
  • 102

2 Answers2

3

You probably should not be using RegExp to parse HTML - it's not particularly useful with (X)HTML-style markup as there are way too many edge cases.

Instead, parse the HTML as you would an element in the DOM, then compare the trim()med innerText value of each <p> with a blank string, and remove those that are equal:

var str = "<p><br/></p><p>This paragraph has text</p>"
var ele = document.createElement('body');
ele.innerHTML = str;
[...ele.querySelectorAll('p')].forEach(para => {
  if (para.innerText.trim() === "") ele.removeChild(para);
});

console.log(ele.innerHTML);
esqew
  • 42,425
  • 27
  • 92
  • 132
2

You should be able to use the following expression: <p[^>]*>(&nbsp;|\s+|<br\s*\/?>)*<\/p>

The expression above looks at expressions enclosed in <p>...</p> and matches them against &nbsp;, whitespace (\s+) and <br> (and / variations).

I think you were mostly there with /<p[^>]*>(?:\s|&nbsp;)*<\/p>/, but you just needed to remove ?: (not sure what you were trying to do here), and adding an additional case for <br>.

const str = `
<p><br></p>
<p><br/></p>
<p><br /></p>
<p> <br/> </p>
<p> </p>
<p>&nbsp; </p>
<p><br/> &nbsp;</p>
<p>
  <br>
</p><!-- multiline -->
<p><br/> don't replace me</p>
<p>don't replace me</p>
`;

const exp = /<p[^>]*>(&nbsp;|\s+|<br\s*\/?>)*<\/p>/g;

console.log(str.replace(exp, ''));
Soc
  • 7,425
  • 4
  • 13
  • 30