-1

I'm using paperclip and AWS S3 for file storage.

I have a Car model and an Image model. A car has_many :images. The image has_attachment :file.

A car can have as many images as I want.

What I wanted is a way to download all these car's images at the same time.

I have working code:

def download
  @images = @car.images

  compressed_filestream = Zip::OutputStream.write_buffer do |zos|
    @images.each do |img|
      zos.put_next_entry img.file_file_name
      zos.print open(img.file.url).read
    end
  end

  compressed_filestream.rewind
  send_data compressed_filestream.read, filename: "#{@car.name}.zip"
end

When /cars/1/download is requested the above controller actions runs. It works but I find it really slow. What I want now is a faster solution for mass download. I find download time to take 6 seconds per megabyte.

I want a faster way. I know that you can go to any web page, right-click, and "Save As..." in order to save that particular page. When the page has images, they appear in a new folder after the download finishes. The download is also really fast. I guess this is so because the browser has already downloaded those images, so it justs gives them to my computer instead of fetching the images again. If the browser can download an HTML file and a folder of assets, we should be able to make the browser download just a folder of images right?

I have a few ideas that I will work on but I want to know if anyone has some faster solutions or input at least on current ideas.

Ideas:

  1. Instead of drafting a new .zip file everytime someone wants to download, edit the .zip file everytime the car's images get updated. This way when the user requests all the images, the file already exists, and they just download it. But where should these .zip files go? Where and how do we save them?

  2. In JavaScript you can create blob files using some image url. Can we load all the images after the page has loaded? This way the page load is fast, but then in the background, while the user is viewing the page, the browser is downloading the images in the background. If the user decides to download them, the download time is fast.

  3. Maybe my controller action could be improved to create a temporary .zip file faster.

Ideas anyone?

Cruz Nunez
  • 2,949
  • 1
  • 23
  • 33
  • 1
    Assuming network bandwidth is not the bottleneck, you'd do well to parallelize downloads and mutex on zipping. Typhoeus makes this easy and efficient, fwiw. Probably also worth *not* compressing much / at all. JPG data doesn't compress much. [This is another interesting option.](http://stackoverflow.com/a/21210576/203130) – coreyward Jan 18 '17 at 23:19
  • @coreyward I got it working and went from downloading 6MB in 40 seconds to 1.7 seconds. I don't know what you mean by mutex on zipping. I also didn't use Typhoeus. I'm happy with these numbers, Will mutex on zipping and Typhoeus improve it even more? – Cruz Nunez Jan 20 '17 at 07:14

2 Answers2

0

Your best bet here is to use a primitive caching system in conjunction with the aws s3 gem.

First, you'd create a bucket in S3 called car_image_zips. When someone hits download, you'd reach out to this bucket to see if the car image zip exists there. If it does, download it. If not, download all the files and create the zip and upload it. One thing to note here is if your implementation uses something like Sidekiq for background jobs, you could optimize by making the subsequent upload a background job.

So with that said: I would assume @car has an id. And this also assumes you've configured the AWS S3 gem correctly. So download would look something like this:

def download
 car_id = @car.id
 s3 = AWS::S3.new #should be added as constant somewhere
 bucket = s3.buckets['car_image_zips']
 if buckets.object["#{car_id}_zip"].exists? #Sample naming scheme
  send_data s3.get_object(bucket:'car_image_zips', key:"#{car_id}_zip").body.read
 else
   # Zip up files like you have
   @images = @car.images
   compressed_filestream = Zip::OutputStream.write_buffer do |zos|
    @images.each do |img|
      zos.put_next_entry img.file_file_name
      zos.print open(img.file.url).read
    end
  end
  compressed_filestream.rewindsend_data 
  s3_obj = s3.bucket('car_image_zips').object("#{@car.name}.zip")
  s3_obj.upload_file("#{@car.name}.zip")
  send_data compressed_filestream.read, filename: "#{@car.name}.zip"
end

Granted, I have not tested this, but this should give you a general idea on how to do this with basic caching. It's not perfect as you'll need to download and process it once, but it's a huge gain for a relatively simple solution.

If you really truly wanted to optimize, you could use something like an AWS Lambda function to create a zip every time a file is uploaded to S3 and make it available for download.

jsookiki
  • 522
  • 1
  • 6
  • 23
0

Here is an answer for option 3. In the original code, the .each method waits for each loop to finish before continuing the next. If the download of the picture from the internet to the server takes one second, on average, then it will take a 40 image download about 40 seconds to download. Instead, download all the files at the same time. To do this use Threads.

class CarsController < ApplicationController
  def download
    images = load_images

    filestream = write_file images

    send_data filestream.read, filename: "#{@car.name}.zip"
  end

  def load_images
    threads = []
    images = []

    @car.images.each do |f|
      threads << Thread.new do
        images << { name: f.file_file_name, file: open(f.file.url).read }
      end
    end

    threads.each(&:join)

    images
  end

  def write_file(images)
    require 'zip'

    Zip.default_compression = Zlib::NO_COMPRESSION

    stream = Zip::OutputStream.write_buffer do |zos|
      images.each do |img|
        zos.put_next_entry img[:name]
        zos.print img[:file]
      end
    end

    stream.rewind

    stream
  end
end

You shovel the images and their names into an array using threads. Pass that array of info into a zipfile writer method. Once the zipfile is written, send it to the user.

This reduces the controller action runtime from 40 seconds to 1..2 seconds.

Cruz Nunez
  • 2,949
  • 1
  • 23
  • 33