0

I am an absolute beginner in AWS: I have created a key and an instance, the python script I want to run in the EC2 environment needs to loop through around 80,000 filings, tokenize the sentences in them, and use these sentences for some unsupervised learning.

This might be a duplicate; but I can't find a way to copy these filings to the EC2 environment and run the python script in EC2, I am also not very sure as to how I can use boto3. I am using Mac OS. I am just looking for any way to speed things up. Thank you so so much! I am forever grateful!!!

2 Answers2

0

Here's one way that might help:

  • create a simple IAM role that allows S3 access to the bucket holding your files
  • apply that IAM role to the running EC2 instance (or launch a new instance with the IAM role)
  • install the awscli on the EC2 instance
  • SSH to the instance and sync the S3 files to the EC2 instance using aws s3 sync
  • run your app

I'm assuming you've launched EC2 with enough diskspace to hold the files.

jarmod
  • 71,565
  • 16
  • 115
  • 122
0

Here's what I tried recently:

  1. Create the bucket and keep the bucket accessible for public.
  2. Create the role and add HTTP option.
  3. Upload all the files and make sure the files are public accessible.
  4. Get the HTTP link of the S3 file.
  5. Connect the instance through putty.
  6. wget copies the file into EC2 instance.

If your files are in zip format, one time copy enough to move all the files into instance.

Jamie
  • 1,530
  • 1
  • 19
  • 35
Vamsi
  • 16