0

Right now I have a cron job that runs once a day. It pipes a curl command into a file, gzips that file, then uploads it to an s3 bucket. I'd like to move this off of my server and into aws tooling. What's the recommended way to do this currently? Make a lambda function and schedule it to run daily?

Hugo
  • 1,106
  • 15
  • 25
user433342
  • 859
  • 1
  • 7
  • 26

1 Answers1

2

The most cost effective option would be the one you describe :

aws lambda add-permission --function-name my-function\
                          --action 'lambda:InvokeFunction' --principal events.amazonaws.com
                          --statement-id events-access \
                          --source-arn arn:aws:events:*:123456789012:rule/*

[UPDATE] : what if the file to download is 4Gb ?

In that case, you'll have two options. One with more work but more cost effective. One easier to implement but that might cost a bit more.

Option 1 : full serverless

You can design your AWS Lambda function to download the 4GB content and stream it to S3 by 5 Mb chuncks and compress chunck by chunck. I am not a compression expert, but I am sure it must be possible to find a library handling that for you. The downside is that you need to write specific code, it will not be as easy as combining the AWS CLI and GZIP command line tool.

Option 2 : start an EC2 instance for the duration of the job

The scheduled Lambda function can use EC2's API to start an instance. The job script can be passed to the instance using userdata (a script the instance will execute at boot time). That script can call TerminateInstance when the job is done to kill itself and stop being charged for it. The downside is that you will have to pay for the time this instance is running (you can have 750h/month for free of t2.micro instances) The positive is that you can use standard command line tools such as AWS CLI and GZIP and you will have plenty of local storage for your task.

Here is how to start an instance from Python : https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Client.start_instances

Sébastien Stormacq
  • 14,301
  • 5
  • 41
  • 64