1

I have a large csv file stored in S3, I would like to download, edit and reupload this file without it ever touching my hard drive, i.e. read it straight into memory from S3. I am using the python library boto3, is this possible?

Ciaran
  • 451
  • 1
  • 4
  • 14
  • whenever any program runs on the machine it would implicitly or explicitly "touch" the hard drive. I am pretty sure this must be hypothetical question. You can use Pandas library for reading the CSV file into the memory and then process it in memory and then save it back to the file system. – Mantosh Kumar Nov 20 '19 at 05:19
  • @MantoshKumar I think the way you suggested will load file in RAM wont save it on disk. So how it will touch the hard drive? do you mean `to_csv` will do that? – Yugandhar Chaudhari Nov 20 '19 at 05:43

1 Answers1

2

You should look into the io module

Depending on how you want to read the file, you can create a StringIO() or BytesIO() object and download your file to this stream.

You should check out these answers:

  1. How to read image file from S3 bucket directly into memory?
  2. How to read a csv file from an s3 bucket using Pandas in Python
Abhinav_A
  • 58
  • 7