I'm learning how to use Python in the Amazon AWS Lambda service. I'm trying to read characters from an S3 object, and write them to another S3 object. I realize I can copy the S3 object to a local tmp file, but I wanted to "stream" the S3 input into the script, process and output, without the local copy stage if possible. I'm using code from this StackOverFlow (Second answer) that suggests a solution for this.
This code contains two "yield()" statements which are causing my otherwise working script to throw a "generator is noto JSON serializable" error. I'm trying to understand why a "yield()" statement would throw this error. Is this a Lambda environment restriction, or is this something specific to my code that is creating the serialization issue. (Likely due to using an S3 file object?).
Here is my code that I run in Lambda. If I comment out the two yield statements it runs but the output file is empty.
from __future__ import print_function
import json
import urllib
import uuid
import boto3
import re
print('Loading IO function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
print("Received event: " + json.dumps(event, indent=2))
# Get the object from the event and show its content type
inbucket = event['Records'][0]['s3']['bucket']['name']
outbucket = "outlambda"
inkey = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
outkey = "out" + inkey
try:
infile = s3.get_object(Bucket=inbucket, Key=inkey)
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(inkey, bucket))
raise e
tmp_path = '/tmp/{}{}'.format(uuid.uuid4(), "tmp.txt")
# upload_path = '/tmp/resized-{}'.format(key)
with open(tmp_path,'w') as out:
unfinished_line = ''
for byte in infile:
byte = unfinished_line + byte
#split on whatever, or use a regex with re.split()
lines = byte.split('\n')
unfinished_line = lines.pop()
for line in lines:
out.write(line)
yield line # This line causes JSON error if uncommented
yield unfinished_line # This line causes JSON error if uncommented
#
# Upload the file to S3
#
tmp = open(tmp_path,"r")
try:
outfile = s3.put_object(Bucket=outbucket,Key=outkey,Body=tmp)
except Exception as e:
print(e)
print('Error putting object {} from bucket {} Body {}. Make sure they exist and your bucket is in the same region as this function.'.format(outkey, outbucket,"tmp.txt"))
raise e
tmp.close()