Python 'yield' statements cause JSON not serializable errors in LAMBDA AWS test case

Question

I'm learning how to use Python in the Amazon AWS Lambda service. I'm trying to read characters from an S3 object, and write them to another S3 object. I realize I can copy the S3 object to a local tmp file, but I wanted to "stream" the S3 input into the script, process and output, without the local copy stage if possible. I'm using code from this StackOverFlow (Second answer) that suggests a solution for this.

This code contains two "yield()" statements which are causing my otherwise working script to throw a "generator is noto JSON serializable" error. I'm trying to understand why a "yield()" statement would throw this error. Is this a Lambda environment restriction, or is this something specific to my code that is creating the serialization issue. (Likely due to using an S3 file object?).

Here is my code that I run in Lambda. If I comment out the two yield statements it runs but the output file is empty.

from __future__ import print_function

import json
import urllib
import uuid
import boto3
import re

print('Loading IO function')

s3 = boto3.client('s3')


def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))

# Get the object from the event and show its content type
inbucket  = event['Records'][0]['s3']['bucket']['name']
outbucket = "outlambda"
inkey     = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
outkey    = "out" + inkey
try:
    infile = s3.get_object(Bucket=inbucket, Key=inkey)

except Exception as e:
    print(e)
    print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(inkey, bucket))
    raise e

    tmp_path = '/tmp/{}{}'.format(uuid.uuid4(), "tmp.txt")
#   upload_path = '/tmp/resized-{}'.format(key)

    with open(tmp_path,'w') as out:
        unfinished_line = ''
        for byte in infile:
             byte = unfinished_line + byte
             #split on whatever, or use a regex with re.split()
             lines = byte.split('\n')
             unfinished_line = lines.pop()
             for line in lines:
                  out.write(line)
                  yield line          # This line causes JSON error if uncommented
             yield unfinished_line    # This line causes JSON error if uncommented
    #
    # Upload the file to S3
    #
    tmp = open(tmp_path,"r")
    try:
       outfile = s3.put_object(Bucket=outbucket,Key=outkey,Body=tmp)
    except Exception as e:
       print(e)
       print('Error putting object {} from bucket {} Body {}. Make sure they exist and your bucket is in the same region as this function.'.format(outkey, outbucket,"tmp.txt"))
       raise e

    tmp.close()

Are you sure the code is formatted correctly? Specifically is the 'with open' block really inside the exception handling code? — FujiApple, Aug 17 '16 at 23:11
Thanks, I found several other significant issues with this code. I will look at that as well. (Missed that one). — Ross Youngblood, Aug 18 '16 at 15:35

score 3 · Accepted Answer · answered Aug 18 '16 at 02:13

3

A function includes yield is actually a generator, whereas the lambda handler needs to be a function that optionally returns a json-serializable value.

answered Aug 18 '16 at 02:13

Lei Shi

757
4
8

This makes a lot of sense. If I encapsulate the function call then, I should be able to make it work. – Ross Youngblood Aug 18 '16 at 15:35

score 0 · Answer 2 · answered Aug 18 '16 at 15:52

Thanks to Lei Shi for answering the specific point I was asking about. Also Thanks to FujiApple for pointing out a missed coding mistake in the original code. I was able to develop a solution without using yield that seemed to work copying the input file to output. But with Lei Shi and FujiApples comments I was able to modify that code to create a sub function, called by the lambda handler which could be a generator.

from __future__ import print_function

import json
import urllib
import uuid
import boto3
import re
print('Loading IO function')

s3 = boto3.client('s3')

def processFile( inbucket,inkey,outbucket,outkey):
    try:
        infile = s3.get_object(Bucket=inbucket, Key=inkey)

    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(inkey, bucket))
        raise e

    inbody   = infile['Body']
    tmp_path = '/tmp/{}{}'.format(uuid.uuid4(), "tmp.txt")
#   upload_path = '/tmp/resized-{}'.format(key)

    with open(tmp_path,'w') as out:
        unfinished_line = ''
        bytes=inbody.read(4096)
        while( bytes ):
             bytes = unfinished_line + bytes
             #split on whatever, or use a regex with re.split()
             lines = bytes.split('\n')
             print ("bytes %s" % bytes)
             unfinished_line = lines.pop() 
             for line in lines:  
                  print ("line %s" % line)
                  out.write(line)
                  yield line     # if this line is commented out uncomment the     unfinished line if() clause below
             bytes=inbody.read(4096)
#       if(unfinished_line):
#                 out.write(unfinished_line) 
    #
    # Upload the file to S3
    #
    tmp = open(tmp_path,"r")
    try:
       outfile = s3.put_object(Bucket=outbucket,Key=outkey,Body=tmp)
    except Exception as e:
       print(e)
       print('Error putting object {} from bucket {} Body {}. Make sure they exist and your bucket is in the same region as this function.'.format(outkey, outbucket,"tmp.txt"))
       raise e

    tmp.close()

def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))

    # Get the object from the event and show its content type
    inbucket  = event['Records'][0]['s3']['bucket']['name']
    outbucket = "outlambda"
    inkey     = urllib.unquote_plus(event['Records'][0]['s3']['object']   ['key'].encode('utf8'))
    outkey    = "out" + inkey

    processFile( inbucket,inkey,outbucket,outkey)

I'm posting the solution which uses yield in a sub "generator" function. Without the "yield" the code misses the last line, which was picked up by the if clause commented out.

Python 'yield' statements cause JSON not serializable errors in LAMBDA AWS test case

2 Answers2