41

Using jQuery, I'd like to remove the whitespace and line breaks between HTML tags.

var widgetHTML = '    <div id="widget">        <h2>Widget</h2><p>Hi.</p>        </div>';

Should be:

alert(widgetHTML); // <div id="widget"><h2>Widget</h2><p>Hi.</p></div>

I think the pattern I will need is:

>[\s]*<

Can this be accomplished without using regex?

mager
  • 4,813
  • 8
  • 29
  • 30

8 Answers8

62

I tried the technique that user76888 laid out and it worked nicely. I packaged it into a jQuery plugin for convenience, and thought the community might enjoy it, so here:

jQuery.fn.cleanWhitespace = function() {
    this.contents().filter(
        function() { return (this.nodeType == 3 && !/\S/.test(this.nodeValue)); })
        .remove();
    return this;
}

To use this, just include it in a script tag, then select a tag to clean with jQuery and call the function like so:

$('#widget').cleanWhitespace();
The Digital Gabeg
  • 2,765
  • 3
  • 21
  • 16
  • 4
    Might I suggest adding a `return this;` after the cleanup? This will allow the user to chain off of your function. – Matt Mills Mar 03 '11 at 06:25
  • 1
    Excellent idea! That's what every other jQuery function does, my function should do it too. Done. – The Digital Gabeg Mar 03 '11 at 17:25
  • 1
    Just what I needed to remove whitespace between buttons in a JSP before adjusting their margins. – kevin cline Jul 29 '11 at 04:30
  • What should I do here to remove only those empty nodes which appear before the first nonEmpty node ? Other empty nodes that are in between should persist as it is. Thanks for any help... – teenup Sep 21 '11 at 11:01
  • very nice! I'd name it `removeWhitespace` rather than `cleanWhitespace` though. – isapir Sep 07 '17 at 23:23
33

A recursive version:

jQuery.fn.htmlClean = function() {
    this.contents().filter(function() {
        if (this.nodeType != 3) {
            $(this).htmlClean();
            return false;
        }
        else {
            this.textContent = $.trim(this.textContent);
            return !/\S/.test(this.nodeValue);
        }
    }).remove();
    return this;
}
Shripad Krishna
  • 10,463
  • 4
  • 52
  • 65
Blago
  • 4,697
  • 2
  • 34
  • 29
  • 3
    This worked better than @user76888 and @the-digital-gabeg I just modified it slightly to `return this;` like @the-digital-gabeg revised his to do. This solved my issues with white spaces in tables which is an IE9 bug. – John Yeary Jan 08 '13 at 15:24
  • 1
    This removes the space after the closing strong tag in the following string: "text more text." – Marcus Jan 24 '15 at 21:16
21

I think this will do it...

cleanWhitespace: function(element) {
 element = $(element);
 for (var i = 0; i < element.childNodes.length; i++) {
   var node = element.childNodes[i];
   if (node.nodeType == 3 && !/\S/.test(node.nodeValue))
     Element.remove(node);
 }
}
user76888
  • 211
  • 1
  • 2
  • This method is great, but it doesn't work for IE (not 9) as it disregards the textnodes and only counts the actual nodes. So you can't loop through them and remove them :( But i have yet to find a better way... – Tokimon Dec 23 '10 at 10:14
2

This is what worked for me and the step by step discovery:

The output is from chrome console

First locato the parent node containing the nasty whitespace

$('.controls label[class="radio"]').parent();

[<div class=​"controls">​
<label class=​"radio">​…​</label>​
" "
"    "
<label class=​"radio">​…​</label>​
" "
"    "
</div>​]

You can see this is wrapped in an array from the [] brackets jQuery will always return an array like structure even when a single item has been found.

So to get to the HTMLElement we take the first item in the array at index 0

$('.controls label[class="radio"]').parent()[0];

<div class=​"controls">​
<label class=​"radio">​…​</label>​
" "
"    "
<label class=​"radio">​…​</label>​
" "
"    "
</div>​

Note how there are no more [] brackets. The reason we need to do this is because jQuery will ignore whitespace in the dom but HTMLElement won't, look what happens when we access the childNodes property

$('.controls label[class="radio"]').parent()[0].childNodes;

[<label class=​"radio">​…​</label>​, 
" ", 
"    ", 
<label class=​"radio">​…​</label>​, 
" ", 
"    "]

We have an array again, yes you spot the [] brackets but do you see another difference, look at all the commas, which we couldn't get with jQuery. Thank you HTMLElement but now we can go back to jQuery because I want to use each instead of a for loop, do you agree with me? So lets wrap the array in jQuery and see what happens:

$($('.controls label[class="radio"]').parent()[0].childNodes);

[<label class=​"radio">​…​</label>​, 
" ", 
"    ", 
<label class=​"radio">​…​</label>​, 
" ", 
"    "]

Perfect! we have exactly the same structure still but nnow inside a jQuery object so lets call each and print "this" to console to see what we have.

$($('.controls label[class="radio"]').parent()[0].childNodes).each(function () { 
   console.log('|'+$(this).html()+'|');
});

|<input id="gender_f" name="gender" type="radio" value="f">Female|
|undefined|
|undefined|
|<input id="gender_m" name="gender" type="radio" value="m" checked="">Male|
|undefined|
|undefined|

So we use jQuery to get the html of each element, standard stuff `$(this).html and because we can't see white space lets pad it with a pipe |, good plan but what do we have here? As you can see jQuery is not able to turn the whitespace to html and now we have undefined. But this is even better because where a space might be truthy undefined is definitely falsy =)

So lets get rid of the suckers with jQuery. All we need is $(this).html() || $(this).remove(); lets see:

$($('.controls label[class="radio"]').parent()[0].childNodes).each(function () { 
   $(this).html() || $(this).remove();
});

[<label class=​"radio">​…​</label>​, 
" ", 
"    ", 
<label class=​"radio">​…​</label>​, 
" ", 
"    "]

Oh dear.. but don't fear! Each still returns the previous structure not the one we've changed, lets look at what our initial query returns now.

$('.controls label[class="radio"]').parent();

[<div class=​"controls">​
<label class=​"radio">​…​</label>​
<label class=​"radio">​…​</label>​
</div>​]

And Wallah! All sexy and pretty =)

So there you have it, how to remove whitespace between elements/tags ala jQuery style.

nJoy!

nickl-
  • 8,417
  • 4
  • 42
  • 56
2

You can probably do this better after setting HTML into a DOM node. Once the browser has parsed everything and built a DOM tree out of our markup, you can do a DOM walk and for every text node that you find, either remove it completely if it has no non-whitespace characters, or trim whitespace off the start and end of it if it does.

levik
  • 114,835
  • 27
  • 73
  • 90
  • +1. `$('
    ').append(widgetHTML)` is a start. From there you walk the child nodes inside the outer div. At the end of it call `.html()` and you'd get your `widgetHTML` without empty whitespace nodes.
    – Roatin Marth Oct 08 '09 at 18:48
1

I had to modify the accepted answer a bit because for some reason chrome didn't want to removeChild() on whitespace nodes. If this happens, you could replace the node with an empty text node like in this example helper function:

 var removeWhiteSpaceNodes = function ( parent ) {
    var nodes = parent.childNodes;
    for( var i =0, l = nodes.length; i < l; i++ ){
      if( nodes[i] && nodes[i].nodeType == 3 && !/\S/.test( nodes[i].nodeValue ) ){
        parent.replaceChild( document.createTextNode(''), nodes[i]  );
      }else if( nodes[i] ){
        removeWhiteSpaceNodes( nodes[i] );
      }
    }
  }

It takes a node from which you want to remove whitespace and recursively replaces all whitespace children with a truly empty textnode.

Quickredfox
  • 1,428
  • 14
  • 20
0

Use

$($.parseHTML(widgetHTML, document, true)).filter("*"),
Specc
  • 46
  • 5
-3

You could $.trim(widgetHTML); to get read of the surrounding whitespace.

Jojo
  • 348
  • 1
  • 8
  • the question is not about trimming whitespaces surrounding the html code. it is about removing whitespaces between html tags. – Arash Milani Apr 28 '12 at 13:48