regex to replace "

" string with empty string- Javascript

Question

I have some HTML as a string

var str= "<p><br/></p>"

How do I strip the p tags from this string using JS. here is what I have tried so far:

str.replace(/<p[^>]*>(?:\s|&nbsp;)*<\/p>/, "") // o/p: <p><br></p>'
str.replace("/<p[^>]*><\\/p[^>]*>/", "")// o/p: <p><br></p>'
str.replace(/<p><br><\/p>/g, "")// o/p: <p><br></p>'

all of them return me same str as above, expected o/p is: str should be "" what im doing wrong here?

Thanks

Could you prevent the offending string from being added to the HTML so that you don't need to try removing it? — Andrew Morton, Oct 13 '21 at 19:08
I'm using React Quill component and that component adds this value, I am not sure if I can prevent without touching the component. at this point my best shot is to remove the empty tags and
tag — user1234, Oct 13 '21 at 19:10
regex is preferred, but if regex does not work,anythign would do — user1234, Oct 13 '21 at 19:11
yeah, tried that too `str.replaceAll("/

<\/p>/g", "");`- didnt work — user1234, Oct 13 '21 at 19:12
the outcome should have no tags in it- shoudl be `""` string — user1234, Oct 13 '21 at 19:15
I'm saying it's just as important to define for us what should *not* be removed as what *should* be removed. One example of input and expected output tells us almost nothing - especially when supported by contradictory text explanation of it being an _empty_ tag (it's not). Define the problem better please. — Wyck, Oct 13 '21 at 19:17
@user1234 in your `replaceAll()` you used quotes, regexes should be without quotes — roneicostajr, Oct 13 '21 at 19:17
str = str.replace(/<(\w+)\/?>(<\/\1>)?<\/p>/g, '') str = str.replace(/ <\/p>/g, '') ... — Raven Murphy, Oct 13 '21 at 19:18

score 3 · Accepted Answer · answered Oct 13 '21 at 19:14

You probably should not be using RegExp to parse HTML - it's not particularly useful with (X)HTML-style markup as there are way too many edge cases.

Instead, parse the HTML as you would an element in the DOM, then compare the trim()med innerText value of each  with a blank string, and remove those that are equal:

var str = "<p><br/></p><p>This paragraph has text</p>"
var ele = document.createElement('body');
ele.innerHTML = str;
[...ele.querySelectorAll('p')].forEach(para => {
  if (para.innerText.trim() === "") ele.removeChild(para);
});

console.log(ele.innerHTML);

score 2 · Answer 2 · answered Oct 13 '21 at 19:45

You should be able to use the following expression: <p[^>]*>( |\s+|<br\s*\/?>)*<\/p>

The expression above looks at expressions enclosed in ... and matches them against  , whitespace (\s+) and   (and / variations).

I think you were mostly there with /<p[^>]*>(?:\s| )*<\/p>/, but you just needed to remove ?: (not sure what you were trying to do here), and adding an additional case for  .

const str = `
<p><br></p>
<p><br/></p>
<p><br /></p>
<p> <br/> </p>
<p> </p>
<p>&nbsp; </p>
<p><br/> &nbsp;</p>
<p>
  <br>
</p><!-- multiline -->
<p><br/> don't replace me</p>
<p>don't replace me</p>
`;

const exp = /<p[^>]*>(&nbsp;|\s+|<br\s*\/?>)*<\/p>/g;

console.log(str.replace(exp, ''));

regex to replace "" string with empty string- Javascript

2 Answers2

regex to replace "

" string with empty string- Javascript