-3

I would like to remove <p align="left" dir="ltr">&nbsp;</p> <p>&nbsp;</p>

tags from string.

str.replace(/\s|&nbsp;/g, '')

I need to format String as it will be part of Email template, It is not complete HTML

Code Hungry
  • 3,930
  • 22
  • 67
  • 95
  • 4
    And are those the only paragraphs in the string, are they nested etc. A regex is generally not a very good way to modify HTML, parsing is. – adeneo Jun 29 '15 at 11:35

1 Answers1

1

Regex is the wrong tool for this.

If you're doing it in a browser, it's easy:

var div = document.createElement('div');
div.innerHTML = str;
Array.prototype.slice.call(div.querySelectorAll('p'), function(p) {
    var html = p.innerHTML.trim();
    if (!html || html.toLowerCase() == "&nbsp;") {
        p.parentNode.removeChild(p);
    }
});
str = div.innerHTML; // Yes, the case of tag names may have changed, etc., but nothing substantive

If you're doing it in another environment, there's an HTML parser available for that environment. NodeJS has several, including cheerio. The JVM (if you're using JavaScript on the JVM) has the excellent JSoup. .Net (if you're using "JScript") has a port of JSoup. Etc.

Community
  • 1
  • 1
T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875