0

I'm learning how to use Python in the Amazon AWS Lambda service. I'm trying to read characters from an S3 object, and write them to another S3 object. I realize I can copy the S3 object to a local tmp file, but I wanted to "stream" the S3 input into the script, process and output, without the local copy stage if possible. I'm using code from this StackOverFlow (Second answer) that suggests a solution for this.

This code contains two "yield()" statements which are causing my otherwise working script to throw a "generator is noto JSON serializable" error. I'm trying to understand why a "yield()" statement would throw this error. Is this a Lambda environment restriction, or is this something specific to my code that is creating the serialization issue. (Likely due to using an S3 file object?).

Here is my code that I run in Lambda. If I comment out the two yield statements it runs but the output file is empty.

from __future__ import print_function

import json
import urllib
import uuid
import boto3
import re

print('Loading IO function')

s3 = boto3.client('s3')


def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))

# Get the object from the event and show its content type
inbucket  = event['Records'][0]['s3']['bucket']['name']
outbucket = "outlambda"
inkey     = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
outkey    = "out" + inkey
try:
    infile = s3.get_object(Bucket=inbucket, Key=inkey)

except Exception as e:
    print(e)
    print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(inkey, bucket))
    raise e

    tmp_path = '/tmp/{}{}'.format(uuid.uuid4(), "tmp.txt")
#   upload_path = '/tmp/resized-{}'.format(key)

    with open(tmp_path,'w') as out:
        unfinished_line = ''
        for byte in infile:
             byte = unfinished_line + byte
             #split on whatever, or use a regex with re.split()
             lines = byte.split('\n')
             unfinished_line = lines.pop()
             for line in lines:
                  out.write(line)
                  yield line          # This line causes JSON error if uncommented
             yield unfinished_line    # This line causes JSON error if uncommented
    #
    # Upload the file to S3
    #
    tmp = open(tmp_path,"r")
    try:
       outfile = s3.put_object(Bucket=outbucket,Key=outkey,Body=tmp)
    except Exception as e:
       print(e)
       print('Error putting object {} from bucket {} Body {}. Make sure they exist and your bucket is in the same region as this function.'.format(outkey, outbucket,"tmp.txt"))
       raise e

    tmp.close()
Community
  • 1
  • 1
Ross Youngblood
  • 502
  • 1
  • 3
  • 16
  • Are you sure the code is formatted correctly? Specifically is the 'with open' block really inside the exception handling code? – FujiApple Aug 17 '16 at 23:11
  • Thanks, I found several other significant issues with this code. I will look at that as well. (Missed that one). – Ross Youngblood Aug 18 '16 at 15:35

2 Answers2

3

A function includes yield is actually a generator, whereas the lambda handler needs to be a function that optionally returns a json-serializable value.

Lei Shi
  • 757
  • 4
  • 8
0

Thanks to Lei Shi for answering the specific point I was asking about. Also Thanks to FujiApple for pointing out a missed coding mistake in the original code. I was able to develop a solution without using yield that seemed to work copying the input file to output. But with Lei Shi and FujiApples comments I was able to modify that code to create a sub function, called by the lambda handler which could be a generator.

from __future__ import print_function

import json
import urllib
import uuid
import boto3
import re
print('Loading IO function')

s3 = boto3.client('s3')

def processFile( inbucket,inkey,outbucket,outkey):
    try:
        infile = s3.get_object(Bucket=inbucket, Key=inkey)

    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(inkey, bucket))
        raise e

    inbody   = infile['Body']
    tmp_path = '/tmp/{}{}'.format(uuid.uuid4(), "tmp.txt")
#   upload_path = '/tmp/resized-{}'.format(key)

    with open(tmp_path,'w') as out:
        unfinished_line = ''
        bytes=inbody.read(4096)
        while( bytes ):
             bytes = unfinished_line + bytes
             #split on whatever, or use a regex with re.split()
             lines = bytes.split('\n')
             print ("bytes %s" % bytes)
             unfinished_line = lines.pop() 
             for line in lines:  
                  print ("line %s" % line)
                  out.write(line)
                  yield line     # if this line is commented out uncomment the     unfinished line if() clause below
             bytes=inbody.read(4096)
#       if(unfinished_line):
#                 out.write(unfinished_line) 
    #
    # Upload the file to S3
    #
    tmp = open(tmp_path,"r")
    try:
       outfile = s3.put_object(Bucket=outbucket,Key=outkey,Body=tmp)
    except Exception as e:
       print(e)
       print('Error putting object {} from bucket {} Body {}. Make sure they exist and your bucket is in the same region as this function.'.format(outkey, outbucket,"tmp.txt"))
       raise e

    tmp.close()

def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))

    # Get the object from the event and show its content type
    inbucket  = event['Records'][0]['s3']['bucket']['name']
    outbucket = "outlambda"
    inkey     = urllib.unquote_plus(event['Records'][0]['s3']['object']   ['key'].encode('utf8'))
    outkey    = "out" + inkey

    processFile( inbucket,inkey,outbucket,outkey)

I'm posting the solution which uses yield in a sub "generator" function. Without the "yield" the code misses the last line, which was picked up by the if clause commented out.

Ross Youngblood
  • 502
  • 1
  • 3
  • 16