1

I am trying to build a lambda function with Python, and save the dict outputs to dynamodb database. One of the dict outputs is floating number numpy array. Similar questions probably have been asked, for example, one here using pickle.dumps or numpy.save (How can I serialize a numpy array while preserving matrix dimensions?).

import boto3
import numpy as np
import pickle

# Dynamodb table example with primary keys first_name and last_name
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('users')

# Dict output example
output = {
    'first_name': 'David',
    'last_name': 'Adams',
    'rate', np.random.rand(100000)
}

# Put dict output directly as item to dynamodb
# it will fail due to the data type dynamodb supports
table.put_item(
    Item = output
)

# OR convert array to string, but the numeric array will be truncated
table.put_item(
    Item = {
        'first_name': output['first_name'],
        'last_name': output['last_name'],
        'rate': str(output['rate'])
    }
)

# OR using pickle.dumps to convert numeric array to bytes.
# However, it will fail when the array length is too big, since dynamodb item has a limit of 400kb
table.put_item(
    Item = {
        'first_name': output['first_name'],
        'last_name': output['last_name'],
        'rate': pickle.dumps(output['rate'])
    }
)
Jian
  • 365
  • 1
  • 6
  • 19
  • If you can only store 400kb at a time you'll probably have to split up your pickle file into 400kb chunks, store each chunk, then reassemble. Are you sure you want to store a numpy array in dynamo? Sounds awkward, and definitely not what dynamodb is for. What are you really trying to do? – Matt Messersmith Aug 09 '18 at 19:50
  • The dict output is semi-/un-structured dataset I am hoping to store into database, and later on query to visualize. I probably can store them as json files in s3, but ideally want on databases. – Jian Aug 09 '18 at 20:26
  • Do you have any hard latency requirements? It's usually better to just get it working 9/10, and optimize later. You can get some tests in place and get the rest of your application working, and then optimize this pain point if s3 is too slow for your usecases. `boto3` is a nice API, and reads from S3 really aren't all that bad (it'll be at least 10Mbps, and thats at a minimum, you can easily get more like 50Mbps). – Matt Messersmith Aug 09 '18 at 23:41

0 Answers0