2

I want to "strip" all (somewhat) unnecessary whitespaces from the HTML markup.

Obj: nodeValue

Render: render

For a nodeValue like that, this solution works perfectly well.


However, when having a non-breaking-space   the browsers renders - as we know - differently.

Obj: nodeValue

Render: enter image description here


I want to "strip" the string just like the DOM renderer does.

What is the RegEx that does the Job? Are there other pitfalls where I might cut something that is actually "needed" rendering?

NOTE: I'm operating on innerHTML, so the client can't help me...

Community
  • 1
  • 1
Aron Woost
  • 19,268
  • 13
  • 43
  • 51
  • 1
    Do **NOT** use regexes to manipulate HTML. You'll just end up with a trashed document. Use [HTMLTidy](http://tidy.sourceforge.net/) for such things, and read [this](http://stackoverflow.com/a/1732454/118068) for the reason why regex+html = BAD – Marc B Jan 23 '12 at 20:17
  • @ MarcB I know, that I would get hit by the possibly most famous so thread sometime :) Don't worry, I know what I do... – Aron Woost Jan 23 '12 at 20:19

2 Answers2

6

This replaces all two-or-more whitespaces with a single space:

myStr = myStr.replace(/\s{2,}/g,' ');

However, this will break anywhere you have a <pre> tag, or more generally anywhere that CSS white-space:pre is applied. To be valid, you'd need to getComputedStyle() on the elements in question and then only apply this transformation to the text nodes where the whitespace is not significant.

Phrogz
  • 296,393
  • 112
  • 651
  • 745
2

Replacing /\s\s+/ with " " would do the trick.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592