0

So the problem is the following: I have several .zip files of relatively big size (20-30GB) that I have uploaded to s3 Bucket. Since there is quite complex procedure for unzipping them (because in my case the standard way with using Lambda will not work, I believe due to the size of unarchived documents will be around 100-105gb) so I thought of using these .zip files on Sagemaker Notebook instane and unarchive them there. The problem that I have quite weak internet bandwith and have issue with uploading the files directly as one chunk. Is there any ways to transfer these files from s3 bucket to SM Instance (not using direct uploads from local machine)?

Many thanks!

Keithx
  • 2,994
  • 15
  • 42
  • 71
  • 1
    The traditional way of handling this is to spin up an EC2 instance to use for a little bit. Give it a decent amount of disk space, copy the files from S3, do what you need to with them, transfer them to where you need to and shutdown the EC2. You may pay a couple of dollars depending on how long it takes you but then that's it. – stdunbar Aug 13 '23 at 19:35
  • "using these .zip files on Sagemaker Notebook instane" this seems like the right idea. IF it's a sagemaker notebook *instance* isn't essentially an ec2 instance integreated with sagemaker? Your local connection shouldn't be relevant in that case. Why is it? I'll post a likely dup – erik258 Aug 13 '23 at 21:30
  • Does this answer your question? [Load S3 Data into AWS SageMaker Notebook](https://stackoverflow.com/questions/48264656/load-s3-data-into-aws-sagemaker-notebook) – erik258 Aug 13 '23 at 21:31

1 Answers1

2
  1. Ensure you have enough disk space on the SageMaker notebook instance.
  2. On the notebook instance, open a terminal and run aws s3 sync s3://bucket/zip_files_folder ./SageMaker
  3. Unzip and so on...

You local internet connection makes no difference as the files are transferred between S3 and the notebook instance, which both run in the AWS region.

Gili Nachum
  • 5,288
  • 4
  • 31
  • 33