1

Most web pages are filled with significant amounts of whitespace and other useless characters which result in wasted bandwidth for both the client and server. This is especially true with large pages containing complex table structures and CSS styles defined at the level. It seems like good practice to preprocess all your HTML files before publishing, as this will save a lot of bandwidth, and where I live, bandwidth aint cheap.

It goes without saying that the optimisation should not affect the appearance of the page in any way (According to the HTML standard), or break any embedded Javascript or backend ASP code, etc.

Some of the functions I'd like to perform are:

  • Removal of all whitespace and carriage returns. The parser needs to be smart enough to not strip whitespace from inside string literals. Removal of space between HTML elements or attributes is mostly safe, but iirc browsers will render the single space between div or span tags, so these shouldn't be stripped.
  • Remove all comments from HTML and client side scripts
  • Remove redundant attribute values. e.g. <option selected="selected"> can be replaced with <option selected>

As if this wasn't enough, I'd like to take it even farther and compress the CSS styles too. Pages with large tables often contain huge amounts of code like the following: <td style="TdInnerStyleBlaBlaBla">. The page would be smaller if the style label was small. e.g. <td style="x">. To this end, it would be great to have a tool that could rename all your styles to identifiers comprised of the least number of characters possible. If there are too many styles to represent with the set of allowable single digit identifiers, then it would be necessary to move to larger identifiers, prioritising the smaller identifiers for the styles which are used the most.

In theory it should be quite easy to build a piece of software to do all this, as there are many XML parsers available to do the heavy lifting. Surely someone's already created a tool which can do all these things and is reliable enough to use on real life projects. Does anyone here have experience with doing this?

bobince
  • 528,062
  • 107
  • 651
  • 834
Trent
  • 1,089
  • 3
  • 12
  • 24
  • 3
    Standard answer: Unless you're Google, don't bother. Zip compressing your responses will save you a ton of traffic - it's usually enough. – Pekka Oct 13 '11 at 08:01
  • +1 what Pekka said. For HTML, gzip beats minification, and minification+gzip is of marginal benefit. – bobince Oct 13 '11 at 08:07
  • possible duplicate of [HTML online minimizer/compressor?](http://stackoverflow.com/questions/1654832/html-online-minimizer-compressor) – selbie Oct 13 '11 at 08:07
  • See the question I suggest as a dupe above. Just do an Internet search for "HTML minimizer" – selbie Oct 13 '11 at 08:10
  • I think you may break JScript & DOM-handling code with some of those optimizations. – Alexey Frunze Oct 13 '11 at 08:10

2 Answers2

1

The term you're probably after is 'minify' or 'minification'.

This is very similar to an existing conversation which you may find helpfull:

https://stackoverflow.com/questions/728260/html-minification

Also, depending on the web server you use and the browser used to look at your site, it is likely that your server is already compressing data without you having to do anything:

http://en.wikipedia.org/wiki/HTTP_compression

Community
  • 1
  • 1
Derek Tomes
  • 3,989
  • 3
  • 27
  • 41
0

your 3 points are actually called "Minimizing HTML/JS/CSS"

Can have a look these:

I have done some compression HTML/JS/CSS too, in my personal distributed crawler. which use gzip, bzip2, or 7zip

  • gzip = fastest, ~12-25% original filesize
  • bzip2 = normal, ~10-20% original filesize
  • 7zip = slow, ~7-15% original filesize
Community
  • 1
  • 1
c2h2
  • 11,911
  • 13
  • 48
  • 60
  • The pages I want to compress are all dynamically generated, so the server will be compressing the page on every single request. This might end up being a performance loss, especially since we work in a virtual desktop environment. Our remote desktop software probably runs compression already, so on second thought, I don't think we'd be saving on internet traffic after all. – Trent Oct 14 '11 at 12:16