10

I need to pre-compress some very large html/xml/json files (large data dumps) using either gzip or deflate. I never want to serve the files uncompressed. They are so large and repetitive that compression will probably work very very well, and while some older browsers cannot support decompression, my typical customers will not be using them (although it would be nice if I could generate some kind of 'hey you need to upgrade your browser' message)

I auto generate the files and I can easily generate .htaccess files to go along with each file type. Essentially what I want is some always on version of mod_gunzip. Because the files are large, and because I will be repeatedly serving them, I need a method that allows me to compress once, really well, on the command line.

I have found some information on this site and others about how to do this with gzip, but I wondered if someone could step me through how to do this with deflate. Bonus points for a complete answer that includes what my .htaccess file should look like, as well as the command line code I should use (GNU/Linux) to obtain optimal compression. Super bonus points for an answer that also addresses how to send "sorry no file for you" message to un-compliant browsers.

would be lovely if we could create a "precompression" tag to cover questions like this.

-FT

ftrotter
  • 3,066
  • 2
  • 38
  • 52
  • 1
    you might get better responses from serverfault.com – David Waters Jun 16 '10 at 11:06
  • 1
    Looks like a duplicate of http://stackoverflow.com/questions/75482/how-can-i-pre-compress-files-with-mod-deflate-in-apache-2-x – skaffman Jun 16 '10 at 12:30
  • That was helpful, but discusses only gzip and not deflate pre-compression. It is also in a rails environment and not a php one (although still apache). The similarities in the posts argue for a precompressionn tag – ftrotter Jun 17 '10 at 06:39
  • @ftrotter: Creating a tag is as simple as (re)tagging a question with the new tag. – caf Jun 18 '10 at 07:15
  • Not for me, I do not have enough points yet ;) – ftrotter Jun 18 '10 at 13:52
  • 1
    wouldn't this question be more appropriate for Server Fault? – Evan Plaice Jun 21 '10 at 23:00
  • @ftrotter can you control the directories of the cached files? Can you be sure ALL cached content will be say... in /www/htmlbigfiles ? – Frankie Jun 22 '10 at 00:18
  • Somebody enlighten me please. Why won't you just compress the files right after they're created and serve zipped files as zipped files? Users might also want to have their file packed for storage and they can always unpack it with a dpubleclick and a drag'n'drop. – naugtur Jun 24 '10 at 18:09

4 Answers4

8

Edit: Found AddEncoding in mod_mime

This works:

<IfModule mod_mime.c>
 <Files "*.html.gz">
  ForceType text/html
 </Files>
 <Files "*.xml.gz">
  ForceType application/xml
 </Files>
 <Files "*.js.gz">
  ForceType application/javascript
 </Files>
 <Files "*.gz">
  AddEncoding gzip .gz
 </Files>
</IfModule>

The docs make it sound like only the AddEncoding should be needed, but I didn't get that to work.

Also, Lighttpd's mod_compression can compress and cache (the compressed) files.

Zash
  • 1,636
  • 9
  • 12
  • If you can include the mime type code I would accept this answer, looks like no one is going to give me a complete answer including the deflate option... – ftrotter Jun 20 '10 at 13:43
  • Would that also be a: `Header set Content-Encoding: deflate`? – maxwellb Jun 21 '10 at 18:50
3

If I were you, I would look at inbuilt filesystem compression instead of doing this at the apache layer.

On solaris zfs has transparent compression, use zfs compress to just compress the filesystem. Similarly, windows can compress folders, apache will serve the content oblivious to the fact it's compressed on disk. Linux has filesystems that do transparent compression also.

Jubal
  • 8,357
  • 5
  • 29
  • 30
  • great comment. What file-system on Linux and any advice on doing this in a cloud instance? How to properly set the headers (so the clients can understand the content?) – ftrotter Jun 22 '10 at 19:10
  • It's not as elegant on linux, but there are fuse modules that will do transparent compression/decompression. Like this one: http://miio.net/wordpress/projects/fusecompress/ You wouldn't have to do anything with the headers in apache, because as far as apache's concerned, they're normal files. :-) – Jubal Jun 22 '10 at 19:33
  • 2
    I don't see how this answer addresses the problem. It sounds as if ftrotter wants to pre-compress the files to save the processing overhead at request time. If using a transparent file system compression, Apache will still have to re-compress at request time. – Jason R. Coombs Jun 25 '10 at 11:11
  • I think I must have misread the question. I thought the intent was to save space on the machine, but you're right, after re-reading the question I understand. – Jubal Jun 25 '10 at 16:21
2

For the command line, compile zlib's zpipe: http://www.zlib.net/zpipe.c and then

zpipe < BIGfile.html > BIGfile.htmlz

for example.

Then using Zash's example, set up a filter to change the header. This should provide you with having RAW deflate files, which modern browsers probably support.

For another way to compress files, take a look at using pigz with zlib (-z) or PKWare zip (-K) compression options. Test if these work coming through with Content-Encoding set.

Community
  • 1
  • 1
maxwellb
  • 13,366
  • 2
  • 25
  • 35
  • Oh, and change Z_DEFAULT_COMPRESSION in zpipe to Z_BEST_COMPRESSION. – maxwellb Jun 21 '10 at 19:03
  • does this do "deflate" compression or just gzip? – ftrotter Jun 22 '10 at 11:23
  • zpipe, at least, does deflate. Try and set up a test file for pigz compression, I honestly just don't have the test environment to test this myself right now. Pigz also will compress faster by utilizing multiple cores. Woo. – maxwellb Jun 22 '10 at 14:56
0

A quick way to compress content without dealing directly with moz_gzip/mod_defalte is using ob_gzhandler and modifying headers (before any output is send to the browser).

<?php
/* Replace CHANGE_ME with the correct mime type of your large file. 
 i.e: application/json
*/

ob_start ('ob_gzhandler');
header('Content-type: CHANGE_ME; charset: UTF-8');
header('Cache-Control: must-revalidate');
$offset = 60 * 60 * 2 ;
$ExpStr = 'Expires: ' . gmdate('D, d M Y H:i:s',time() + $offset) . ' GMT';
header($ExpStr);

/* Stuff to generate your large files here */
Javi Stolz
  • 4,720
  • 1
  • 30
  • 27
  • This is doing gzip on the fly. I but the file already exists as a html/json/xml/whatever on the disk. I suppose that I could use php like this to generate the right headers then echo the file (or equivalent) but isnt there a way to do that in just apache? – ftrotter Jun 17 '10 at 06:41