8

My idea is to somehow minify HTML code in server-side, so client receive less bytes.

What do I mean with "minify"?

Not zipping. More like, for example, jQuery creators do with .min.js versions. In other words, I need to remove unnecessary white-spaces and new-lines, but I can't remove so much that presentation of HTML changes (for example remove white-space between actual words in paragraph).

Is there any tools that can do it? I know there is HtmlPurifier. Is it able to do it? Any other options?

P.S. Please don't offer regex'ies. I know that only Chuck Norris can parse HTML with them. =]

daGrevis
  • 21,014
  • 37
  • 100
  • 139
  • 1
    I don't think you need to do this. Most web servers support serving web pages "gzipped". Your whitespaces will no longer become an issue. You should always serve your web pages gzipped. – Stephen Chung Apr 28 '11 at 09:56
  • You can write a simple program that uses an HTML parsing library to parse the HTML file and then write it back out. If you use C#, you can look at the LINQ-to-HTML library. – Stephen Chung Apr 28 '11 at 09:59
  • Agreeing with Stephen Chung: if you gzip the HTML, all whitespace will be compacted. It'll be a faster process than fixing up the HTML itself. – bart Apr 28 '11 at 11:55

5 Answers5

10

A bit late but still... By using output_buffering it is as simple as that:

function compress($string)
{
    // Remove html comments
    $string = preg_replace('/<!--.*-->/', '', $string);

    // Merge multiple spaces into one space
    $string = preg_replace('/\s+/', ' ', $string);   

    // Remove space between tags. Skip the following if
    // you want as it will also remove the space 
    // between <span>Hello</span> <span>World</span>.
    return preg_replace('/>\s+</', '><', $string);      
}

ob_start('compress');

// Here goes your html.    

ob_end_flush();
Savas Vedova
  • 5,622
  • 2
  • 28
  • 44
3

You could parse the HTML code into a DOM tree (which should keep content whitespace in the nodes), then serialise it back into HTML, without any prettifying spaces.

Delan Azabani
  • 79,602
  • 28
  • 170
  • 210
3

Is there any tools that can do it?

Yes, here's a tool you could include into a build process or work into a web cache layer: https://code.google.com/archive/p/htmlcompressor/

Or, if you're looking for a tool to minify HTML that you paste in, try: http://www.willpeavy.com/minifier/

Sofía
  • 784
  • 10
  • 24
Will Peavy
  • 2,349
  • 3
  • 21
  • 21
0

Is there any tools that can do it?

You can use CodVerter Online Web Development Editor for compressing mixed html code.
the compressor was tested multiple times for reliability and accuracy.
(Full Disclosure: I am one of the developers).

enter image description here

Jonathan Applebaum
  • 5,738
  • 4
  • 33
  • 52
0

You can use the Pretty Diff tool: http://prettydiff.com/?m=minify&html It will also minify any CSS and JavaScript in the HTML code, and the minification occurs in a regressive manner so to not prevent future beautification of the HTML back to readable form.

austincheney
  • 1,097
  • 7
  • 8