0

I currently have a Jupyter notebook running in an EC2 spot instance on Amazon. If I terminate the instance, all data will be deleted. I'm wondering if there is a way to download all data from the server to my hard drive, and, at a later date, re-upload all that data. My current directory structure looks like this.

enter image description here

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Parseltongue
  • 11,157
  • 30
  • 95
  • 160

2 Answers2

2

If you just want to download specific directories, then the easiest is to create a Zip file and transfer it however you wish (eg FTP or via S3 -- whatever you're comfortable using).

If you want to backup the whole machine with its full software configuration, then I'd recommend making an AMI (Amazon Machine Image) of the instance. In the EC2 console, choose Actions / Image / Create Image. This will create a copy of the whole disk. You can later launch a new EC2 instance directly from the AMI and it will have an exact copy of the disk. Please note that there is a storage charge for AMIs/snapshots.

Making an AMI is certainly simpler (a few clicks to create, a few clicks to launch in future!).

Oh, and be careful running on a Spot Instance -- your machine might be terminated at any time with very little notice, so I'd recommend making an AMI as soon as possible so you don't lose your setup and your data!

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • 1
    This is extremely helpful! Thank you so much. I've opted to use Spot Instances to save money (and somehow I'm still hemorrhaging money). Do you know if storing the images cost anything? – Parseltongue Jun 30 '17 at 04:05
  • 2
    [Amazon EBS pricing](https://aws.amazon.com/ebs/pricing/) shows snapshots at 5c/GB/month (US regions) -- it only copies blocks on the disk that contain data, so it won't necessarily be as big as your disk. And remember -- the best way to save money is to turn machines off when they're not in use (although that is hard to combine with Spot pricing). – John Rotenstein Jun 30 '17 at 04:20
1

If it's only a Jupyter notebook that you want to save, and resume running it later, then refer to this question.

Else, I'd suggest SSHing in, copying all the files/directories into a single tarball and then scping the tarball into your own computer

Jedi
  • 3,088
  • 2
  • 28
  • 47