16

I need to serve some data from my database in a zip file, streaming it on the fly such that:

  • I do not write a temporary file to disk
  • I do not compose the whole file in RAM

I know that I can do streaming generation of zip files to the filesystemk using ZipOutputStream as here. I also know that I can do streaming output from a rails controller by setting response_body to a Proc as here. What I need (I think) is a way of plugging those two things together. Can I make rails serve a response from a ZipOutputStream? Can I get ZipOutputStream give me incremental chunks of data that I can feed into my response_body Proc? Or is there another way?

Community
  • 1
  • 1
kdt
  • 27,905
  • 33
  • 92
  • 139
  • ZipOutputStream cannot do that because it seeks back and forth through the stream while writing the zipped data (see `ZipOutputStream#update_local_headers`, called from `ZipOutputStream#close`). Thus, it's impossible to serve chunks of data with ZipOutputStream before the operation completes. – Rômulo Ceccon Feb 22 '11 at 18:32

5 Answers5

11

Short Version

https://github.com/fringd/zipline

Long Version

so jo5h's answer didn't work for me in rails 3.1.1

i found a youtube video that helped, though.

http://www.youtube.com/watch?v=K0XvnspdPsc

the crux of it is creating an object that responds to each... this is what i did:

  class ZipGenerator                                                                    
    def initialize(model)                                                               
      @model = model                                                                    
    end                                                                                 
                                                                                        
    def each( &block )                                                                  
      output = Object.new                                                               
      output.define_singleton_method :tell, Proc.new { 0 }                              
      output.define_singleton_method :pos=, Proc.new { |x| 0 }                          
      output.define_singleton_method :<<, Proc.new { |x| block.call(x) }                
      output.define_singleton_method :close, Proc.new { nil }                           
      Zip::IoZip.open(output) do |zip|                                                  
        @model.attachments.all.each do |attachment|                                     
          zip.put_next_entry "#{attachment.name}.pdf"                                   
          file = attachment.file.file.send :file                                        
          file = File.open(file) if file.is_a? String                                   
          while buffer = file.read(2048)                                                
            zip << buffer                                                               
          end                                                                           
        end                                                                             
      end                                                                               
      sleep 10                                                                          
    end                                                                                 
                                                                                        
  end
                                                                                  
  def getzip                                                                            
    self.response_body = ZipGenerator.new(@model)                                       
                                                                                        
    #this is a hack to preven middleware from buffering                                 
    headers['Last-Modified'] = Time.now.to_s                                            
  end                                                                                   

EDIT:

the above solution didn't ACTUALLY work... the problem is that rubyzip needs to jump around the file to rewrite the headers for entries as it goes. particularly it needs to write the compressed size BEFORE it writes the data. this is just not possible in a truly streaming situation... so ultimately this task may be impossible. there is a chance that it might be possible to buffer a whole file at a time, but this seemed less worth it. ultimately i just wrote to a tmp file... on heroku i can write to Rails.root/tmp less instant feedback, and not ideal, but neccessary.

ANOTHER EDIT:

i got another idea recently... we COULD know the compressed size of the files if we do not compress them. the plan goes something like this:

subclass the ZipStreamOutput class as follows:

  • always use the "stored" compression method, in other words do not compress
  • ensure we never seek backwards to change file headers, get it all right up front
  • rewrite any code related to TOC that seeks

I haven't tried to implement this yet, but will report back if there's any success.

OK ONE LAST EDIT:

In the zip standard: http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers

they mention that there's a bit you can flip to put the size, compressed size and crc AFTER a file. so my new plan was to subclass zipoutput stream so that it

  • sets this flag
  • writes sizes and CRCs after the data
  • never rewinds output

furthermore i needed to get all the hacks in order to stream output in rails fixed up...

anyways it all worked!

here's a gem!

https://github.com/fringd/zipline

Community
  • 1
  • 1
fringd
  • 2,380
  • 1
  • 18
  • 13
3

It is now possible to do this directly:

class SomeController < ApplicationController
  def some_action
    compressed_filestream = Zip::ZipOutputStream.write_buffer do |zos|
      zos.put_next_entry "some/filename.ext"
      zos.print data
    end
    compressed_filestream .rewind
    respond_to do |format|
      format.zip do
        send_data compressed_filestream .read, filename: "some.zip"
      end
    end
    # or some other return of send_data
  end
end
noel
  • 2,095
  • 14
  • 14
3

I had a similar issue. I didn't need to stream directly, but only had your first case of not wanting to write a temp file. You can easily modify ZipOutputStream to accept an IO object instead of just a filename.

module Zip
  class IOOutputStream < ZipOutputStream
    def initialize io
      super '-'
      @outputStream = io
    end

    def stream
      @outputStream
    end
  end
end

From there, it should just be a matter of using the new Zip::IOOutputStream in your Proc. In your controller, you'd probably do something like:

self.response_body =  proc do |response, output|
  Zip::IOOutputStream.open(output) do |zip|
    my_files.each do |file|
      zip.put_next_entry file
      zip << IO.read file
    end
  end
end
j05h
  • 79
  • 2
  • 3
    this doesn't work by itself... zip files expect size, compressed_size, and a CRC before the data... this code just builds the file in memory, and the server still waits until it's finished to start sending. use my gem https://github.com/fringd/zipline – fringd Jun 13 '12 at 00:46
0

Use chunked HTTP transfer encoding for output: HTTP header "Transfer-Encoding: chunked" and restructure the output according to the chunked encoding specification, so no need to know the resulting ZIP file size at the begginning of the transfer. Can be easily coded in Ruby with the help of Open3.popen3 and threads.

Konstantin
  • 2,983
  • 3
  • 33
  • 55
0

This is the link you want:

http://info.michael-simons.eu/2008/01/21/using-rubyzip-to-create-zip-files-on-the-fly/

It builds and generates the zipfile using ZipOutputStream and then uses send_file to send it directly out from the controller.

Taryn East
  • 27,486
  • 9
  • 86
  • 108
  • Nope. The question specifies "such that ... I do not write a temporary file to disk". That example creates a temporary file. It's also more or less identical to the first link in the question. – kdt Feb 15 '11 at 15:34
  • The question specifies that the temporary file is not written to disc. The reasonable assumption there is that you don't want temp files piling up in some random directory - having to be destroyed. The solution given destroys the temporary file immediately after it's used. If there's an alternative assumption, please let us know - or your questions is not complete. – Taryn East Feb 15 '11 at 18:45
  • As it is - your two requirements are almost mutually exclusive. Either it's on disc, or it's in RAM... so what is it that you really want and why? – Taryn East Feb 15 '11 at 18:50
  • 4
    @TarynEast You CAN compress/send a whole DVD with a server that has only 100Mb RAM + 100 MB hard-drive. That means sending the zipped content immediately rather than streaming it. So kdt's requirements are not mutually exclusive. Maybe kdt wants to efficiently send enormous amounts of data using a server that is not too expensive. Another advantage is that compression and download times are parallel rather than added. Cheers! – Nicolas Raoul Sep 22 '11 at 10:22