2

I have a workflow which moves 700gb in files from an ftp server to an on-prem server for python script processing.

i would like to migrate this process to a AWS s3 bucket for lambda to process.
i saw AWS data-sync as a reasonable priced solution (0.0125/gb) to move this data to an S3 bucket. but not from an ftp site.

anyone have suggestions how to do this?

note: i've looked into filezilla pro but there is no way to automate this process with a batch command or scripting.

phill
  • 13,434
  • 38
  • 105
  • 141
  • [AWS supports SFTP](https://aws.amazon.com/sftp/?whats-ne) for in and out of s3. Thus, maybe it could be useful. – Marcin Apr 06 '20 at 00:40
  • Think thru your design. You want to copy the data from an FTP server to S3. This will require both compute and network resources. Then you plan to copy the data from S3 to Lambda for processing. Instead, launch EC2, download the data from FTP to EC2 and process. AWS charges for data transfer and for storage. Google search `FTP file sync` for tools that can automatically sync from an FTP server to local storage (on EC2). – John Hanley Apr 06 '20 at 03:25
  • the transfer rates using aws sftp is completely unreasonable. – phill Apr 06 '20 at 16:29
  • it looks like i can run winscp in a scheduled dos batch script in an ec2 instance but it seems to be an unnecessary step if there is a way to load the files directly to lambda for processing. i didn't know lambda could store files. – phill Apr 06 '20 at 16:31

1 Answers1

3

AWS Lambda is not a good choice for such job due to dynamic memory requirements and unreliable latency time between your FTP site and Lambda Function.

Looks like you are trying to copy 700GB data into S3 via some AWS service. If this a correct statement, then please do serious cost calculations for the following:

  1. S3 pricing is function of amount data transfer and frequency of retrieval. Reading writing 700GB data will cost significantly per month.

  2. Lambda function execution time and memory. Whenever Lambda will be executed it will read the file into temp memory var. This is where you will get high cost as Lambda function costing depends on amount of memory used.

Second the connection speed between FTP site and Lambda edge server is also worth mentioning, since more the latency more quickly you will exhaust your free 1M lambda request quota.

I would recommend to use Python/Ruby/PHP script either on FTP server or on-premise local machine and upload files to S3 buckets. If you are going with approach then do think about archiving the data into Glacier so that you will save money.

If you need Lambda code please let me know I will be happy to share with you. Hope this will help.

HSharma
  • 82
  • 2
  • i didn't know there were memory limitations in lambda to temporarily process the 700gb in files. thanks for this. i thought the 1MB lambda limit was just on their scripts to run. My goal was to offload the processing (very slow-takes 3 days) from my on-prem to aws since the files being parsed ultimately end up in the ec2 server running sql server at a faster execution time. would you suggest a better service for this?. – phill Apr 08 '20 at 20:47
  • Good to know you are using EC2 and is the final destination of the processed files. If you can help me with the intended solution in terms of Cost v/s Technically feasible AWS Solution, I can share the AWS solution architect. – HSharma Apr 09 '20 at 05:54
  • Here are the options I have in my mind: 1) You are looking for the optimal AWS solution? 2) You are looking for a cost optimised solution even if some on-premise processing is acceptable? 3) You are looking for a cost optimised AWS solution ONLY i.e. NO on-premise processing anymore? – HSharma Apr 09 '20 at 05:56
  • i'm looking for a cost optimized AWS solution only. do you have any python code examples with import libraries that work on lambda ? – phill Apr 19 '20 at 16:27