0

I am looking to use boto3 or even something like smart_open within Python to read a file line by line from S3 then process each line (ex. clean certain fields) and then write those lines back into S3. The key is to not have any of the data in memory. Any suggestions? I have tried using the following below with no success

into = "s3://"+access_key+":"+secret_key+"@"+bucket+"/Filetoread.csv"
out = "s3://"+access_key+":"+secret_key+"@"+bucket+"/Filetowrite.csv"

def streamline(inputfile, outputfile):
with smart_open.smart_open(inputfile, 'r') as infile, smart_open.smart_open(outputfile, 'w') as outfile:
    for line in infile:
        outfile.write(line + '\n')

streamline(into, out)
jumpman23
  • 385
  • 2
  • 5
  • 13
  • I'm not familiar with `smart_open` but answered a similar question recently: https://stackoverflow.com/questions/48529862/replace-single-character-in-a-line-of-a-text-file-with-python/48531038#48531038. You can try and see if you can implement that into your solution. – r.ook Jan 31 '18 at 22:08
  • Yeah that is similar I just need a specific answer for s3 and working within the AWS ecosystem. – jumpman23 Feb 01 '18 at 02:15
  • @jumpman23 this looks like a duplicate of: https://stackoverflow.com/questions/44043036/how-to-read-image-file-from-s3-bucket-directly-into-memory does the accepted answer of that question answer your question here? – Lucas Roberts Oct 26 '21 at 01:19
  • 1
    Does this answer your question? [How to read image file from S3 bucket directly into memory?](https://stackoverflow.com/questions/44043036/how-to-read-image-file-from-s3-bucket-directly-into-memory) – Lucas Roberts Oct 26 '21 at 01:20

0 Answers0