2

Every night we generate reports to CSV files from our system and backup these files to Amazon S3. Then we would need to copy these files (usually 1-5 files, the files don't have more than 5MB) from the Amazon S3 storage to another FTP.

What's the best way to do it? The system is written in Ruby On Rails. Generating CSV files is ran every night with using CRON.

I can upload a single file from my laptop like this:

  def upload_to_ftp
    Net::SFTP.start('FTP_IP', 'username', :password => 'password') do |sftp|
      sftp.upload!("/Users/my_name/Downloads/report.csv", "/folder_on_the_ftp/report.csv")
    end
    render :nothing => true
  end

But how to upload a few files not from a local hard drive, but from Amazon S3?

Thank you

user984621
  • 46,344
  • 73
  • 224
  • 412

1 Answers1

2

Perhaps I'm not imaginative enough but I think you'll need to download it to your server and then upload it to the FTP.

You're missing just reading from S3; using ruby-aws-sdk it's simple, look here: http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html

But if the files grow larger than 5MB, you can use IO streams.

As far as I know Net:SFTP#upload! accepts an IO stream as an input. This is one side of the equation.

Then use ruby-aws-sdk to download the CSVs using streaming reads (again reference: http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html). So in one thread write to 'buffer' (an instance of a class deriving from 'IO'):

s3 = AWS::S3.new
obj = s3.buckets['my-bucket'].objects['key']
obj.read do |chunk|
  buffer.write(chunk)
end

In another thread run the upload using the 'buffer' object as the source.

Note that I haven't used this solution myself but this should get you started.

Also note that you'll buffer incoming data. Unless you use a temporary file and you have sufficient disk space on the server, you need to limit the amount of data you store in the 'buffer' (i.e. call #write only if you're below the maximum size of the object).

This is Ruby; it's not as if it has first-class support for concurrency.

I'd personally either upload to S3 and SFTP from the same code or if that is impossible, download the entire CSV file and then upload it to the SFT. I'd switch to streams only if this is necessary as an optimization. (Just my $.0002).

Marcin Bilski
  • 572
  • 6
  • 13
  • Thanks Marcin for your answer. It sounds kinda complicated, considering direct upload to both servers - S3 + the FTP one. I wanted to avoid this scenario, but after all it might be more efficient. – user984621 Nov 16 '15 at 20:02
  • I thought about it more and perhaps this is what you may find interesting: http://stackoverflow.com/questions/23939179/ftp-sftp-access-to-an-amazon-s3-bucket If this is to make it easy to access the files, this should be pretty straightforward. On the other hand, if you're copying to ftp to have additional backup for safety reasons, don't: https://blog.cloudsecurityalliance.org/2010/05/24/amazon-aws-11-9s-of-reliability/ – Marcin Bilski Nov 16 '15 at 20:47