I have a production web scraping service and find myself paying for a lot of inactive EC2 hours. I have recently finished setting up some serverless task scheduling using Mark Edmondson's googleCloudRunner. I run into issues when inserting new data to my production DB on AWS. That being said I've wondered if the above question is doable for years.
- Launching and EC2 via PAWS is certainly possible
ec2 <- paws::ec2()
# Start an EC2 instance.
resp <- ec2$run_instances(
ImageId = "ami-f973ab84",
InstanceType = "t2.micro",
MinCount = 1,
MaxCount = 1,
KeyName = "default",
Placement = list(AvailabilityZone = "us-east-1a"),
TagSpecifications = list(
list(
ResourceType = "instance",
Tags = list(
list(Key = "webserver", Value = "production"))
),
list(
ResourceType = "volume",
Tags = list(
list(Key = "cost-center", Value = "cc123")
)
)
)
)
- Running a script on start looks very possible
"When you launch an instance in Amazon EC2, you have the option of passing user data to the instance that can be used to perform common automated configuration tasks and even run scripts after the instance starts." here
how can I pass user data / script using R paws?
- Terminating after run
Can I terminate an EC2 from within that EC2?
I would probably do this using a small on demand instance which would check logs that are written in my production database once a task completes. This part I'm not worried about.