I am looking to use boto3 or even something like smart_open within Python to read a file line by line from S3 then process each line (ex. clean certain fields) and then write those lines back into S3. The key is to not have any of the data in memory. Any suggestions? I have tried using the following below with no success
into = "s3://"+access_key+":"+secret_key+"@"+bucket+"/Filetoread.csv"
out = "s3://"+access_key+":"+secret_key+"@"+bucket+"/Filetowrite.csv"
def streamline(inputfile, outputfile):
with smart_open.smart_open(inputfile, 'r') as infile, smart_open.smart_open(outputfile, 'w') as outfile:
for line in infile:
outfile.write(line + '\n')
streamline(into, out)