3

I am using NoSQL DynamoDB for my Project. How can I auto-generate a key which I can use for queries?

 DynamoDB_view(tableName, campaign_tag_app_group_map_id, campaign_id,
                      tag_id, tag_type, app_id, group_id, group_p, tenant_id,
                      insertion_timestamp, insertion_user_id):
    print "in func DynamoDB_view"
    def insert_to_dynamo(conn, tableName, campaign_tag_app_group_map_id, campaign_id,
                      tag_id, tag_type, app_id, group_id, group_p, tenant_id,
                      insertion_timestamp, insertion_user_id):
        print "in Insert"
        print tableName
# uswd the data as random key generation just for now. this is inappropriate
        data = str(uuid.uuid4().get_hex()[0:16]) 
        table = conn.get_table(tableName)
        item_data = {
        'campaign_id': str(campaign_id),
        'tag_id': tag_id,
        'tag_type': tag_type,
        'app_id' : app_id,
        'group_id' : str(group_id),
        'group_p' : group_p,
        'tenant_id' : str(tenant_id),
        'insertion_timestamp' : str(datetime.now()),
        'insertion_user_id' : str(insertion_user_id)
        }
        item = table.new_item(
        # Our hash key is 'forum'
        hash_key=data,
        
        range_key='Check this out!',
       
        attrs=item_data
        )
        item.put()
    def connection_dynamo(tableName, campaign_tag_app_group_map_id, campaign_id,
                      tag_id, tag_type, app_id, group_id, group_p, tenant_id,
                      insertion_timestamp, insertion_user_id):
   
        conn = boto.dynamodb.connect_to_region(
        'us-east-1',
        aws_access_key_id=settings.ACCESS_KEY,
        aws_secret_access_key=settings.PASS_KEY)    

        insert_to_dynamo(conn,tableName, campaign_tag_app_group_map_id, campaign_id,
                      tag_id, tag_type, app_id, group_id, group_p, tenant_id,
                      insertion_timestamp, insertion_user_id)
        
    
    connection_dynamo(tableName, campaign_tag_app_group_map_id, campaign_id,
                      tag_id, tag_type, app_id, group_id, group_p, tenant_id,
                      insertion_timestamp, insertion_user_id)
   
wowkin2
  • 5,895
  • 5
  • 23
  • 66

1 Answers1

0

Here's a link to some docs:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html

For a query, you must supply the hash key and you must check for equality. If you have a range key, it's optional and you can perform a wider set of operations using them than just equality. For performance, you do not want a "hot key" for your hash key (using the same key all the time).

A lot of the answer boils down to what you'll have in hand when you do the query, and whether you have to worry about that killing performance or not. Auto-generating something random will save you from the hot key problem, but you may not be able to reproduce those values when you go back to query your data (even if you always use the same seed for a RNG, your head may explode before you get the hash key you want). That might force you into a situation where you are doing a scan instead of a query, which is generally not desirable.

Will you have any of the campaign_id, group_id, tenant_id, etc. fields available to you at query time? If the answer is yes, you at least have some candidate for your hash key. You should still think about how much data you will have in the table, and how much of it will have the same group_id for example. If you have both a group_id and a tenant_id at query time and there is a lot more diversity of tenant_id values, use those. You can also combine two ID's to make your key value if that will help spread around the data.

If you only have group_id and you only have a small number of groups, appending some randomness to the end of the group_id in order to avoid hot keys won't help you. From the point of view of doing a query, you'll be back in the same situation where you have a bunch of keys that are essentially unrecoverable. In this case, maybe the least painful thing would be to have a table for each group_id, use totally random keys for a good spread, and just accept that your data forces you to do scans.

If you can get a good hash key, then your most common queries may dictate your choice of range keys. So if you usually query for records from the last 24 hours, insertion_timestamp might be a good choice. If some other factor enters into a lot of queries, use that instead, like if you limit query results to certain campaigns and those campaigns don't have completely random names. Or, if you have like 3 common queries that rely on different ranges/criteria, then you might want to add some local secondary indexes (Difference between local and global indexes in DynamoDB).

To get back to what you might be asking, if you have nothing in hand when you go to query the data, then you may be screwed and you may have to do a scan to get back your data. In this case, using something as random as possible for your hash key will at least be nice to your writes and will ensure a good distribution of your data.

Sorry this got kind of rambly, hopefully there's something helpful in there. If I totally misunderstood or there's some other unstated constraint, please edit your question to reflect it.

Community
  • 1
  • 1
Foobie Bletch
  • 300
  • 2
  • 8