0

I have a rails application that fetches a csv.zip file from S3. Is there a way to stream the s3 file and unzip it on the fly using RubyZip or another gem? I dont want to download the whole file in memory.

I'm using a block for downloading the S3 object. When you pass a block to #get_object, chunks of data are yielded as they are read off the socket.

s3.get_object(bucket: 'bucket-name', key:'object-key') 
do |chunk|puts(chunk) 
end

When I try to unzip the chunk using RubyZip, this throws an error:

Zip::File.open(chunk) do |zip_file| 
zip_file.each do |entry| 
puts(entry.get_input_stream.read) 
end
  • What research have you done for the streaming question, and what error are you getting? – D. SM May 05 '20 at 00:49
  • Ill get back to you on the error. But it sounds like its not possible to stream a file and unzip chunks given the format of zip files. You have to download the entire file to a disk and then unzip: https://stackoverflow.com/questions/23377263/stream-and-unzip-large-csv-file-with-ruby To avoid doing this, I looked at using Gzip files. The problem with that is Zlib seems to have issues with unzipping S3 multipart uploads. So I am now exploring building Gzip files in a single S3 upload. – Mimi R. May 06 '20 at 14:49

1 Answers1

0

if zlib attempts to unzip a multipart gz file, its only able to unzip the first chunk.

zlib is able to unzip s3 single uploaded files. so if you're able to use s3 single uploads vs s3 multipart uploads for your application, i recommend exploring that route.