1

I have a simple dynamodb2 table that contains task_names (string) and their start times (number/float). How can I get all records that have start time greater than a given number X? Currently, I tried making start time as a global secondary index and used the following:

table hashkey is task_name and range key is start time. I made a secondary global index on start_time (start_time-index) and use it to query below.

recs =tab.query(start_time__gte=1, index='start_time-index')

However, it says:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nail/home/osarood/work/.venv/local/lib/python2.7/site-packages/boto/dynamodb2/results.py", line 62, in __next__
    self.fetch_more()
  File "/nail/home/osarood/work/.venv/local/lib/python2.7/site-packages/boto/dynamodb2/results.py", line 146, in fetch_more
    results = self.the_callable(*args, **kwargs)
  File "/nail/home/osarood/work/.venv/local/lib/python2.7/site-packages/boto/dynamodb2/table.py", line 1132, in _query
    **kwargs
  File "/nail/home/osarood/work/.venv/local/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 1522, in query
    body=json.dumps(params))
  File "/nail/home/osarood/work/.venv/local/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 2100, in make_request
    retry_handler=self._retry_handler)
  File "/nail/home/osarood/work/.venv/local/lib/python2.7/site-packages/boto/connection.py", line 937, in _mexe
    status = retry_handler(response, i, next_sleep)
  File "/nail/home/osarood/work/.venv/local/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 2140, in _retry_handler
    response.status, response.reason, data)
boto.dynamodb2.exceptions.ValidationException: ValidationException: 400 Bad Request
{u'message': u'Query key condition not supported', u'__type': u'com.amazon.coral.validate#ValidationException'}
Krease
  • 15,805
  • 8
  • 54
  • 86
Osman Sarood
  • 121
  • 10

1 Answers1

-2

(EDITED - see revision history for original incorrect answer)

The reason you're getting an error is because you haven't specified the hashkey for the query - you must include the hash key attribute name and value as an EQ condition.

In order to solve this, there are a couple of alternatives:

If you know the task name when performing your range query, you don't need a GSI; just include the taskName in the query with EQ condition.

If you don't know the task name, you are correct that a GSI can solve this, but there are limitations to how it works. The thing to keep in mind with GSI is that they're basically just tables as well, but they're maintained under the hood by Dynamo. As such, they also need a hashkey and rangekey to specify uniqueness.

  • In order to query without the task name, you will need an alternative hashkey - to keep it simple, I'd suggest an approach similar to this one:
    • Create a GSI with the hash key as YearMonth (e.g., 201508) and range key timestamp+taskName (The reason for appending taskName to the GSI range key is to ensure proper uniqueness of the hashkey/rangekey combination in your GSI)
    • Query the GSI multiple times, one query for each month. The queries are also filtered by range key timestamp > [given timestamp].
    • A drawback with this approach is that if you are creating lots of tasks quickly, you're likely to have a hot hash key on YearMonth, which will impact your table throughput - this answer has further tips for dealing with time series data:
    • make the GSI hashkey more granular
    • use different tables (generated table names based on date ranges) possibly with different hashkey granularity, to manage hot/cold data

As a last resort, consider using a full table scan (this is a last resort, as it'll really impact your throughput capacity)

Krease
  • 15,805
  • 8
  • 54
  • 86
  • Adding a range key (likely) won't help here. As noted in your answer, range keys are only useful when you're querying a entry with a known hash key. So unless all the hash keys are known (in this example, task names) in advance, you cannot query just by start_time. – jonson Mar 03 '15 at 14:29
  • @jonson - quite right. I don't know what I was thinking when I wrote this. Funny that OP accepted it, which prevents me from deleting it. Please feel free to edit in corrections to make it useful. – Krease Dec 04 '18 at 18:07
  • @jonson - I've substantially updated the answer. I'm open to further edit suggestions if you have them. – Krease Dec 04 '18 at 19:05