DynamoDB Between query on GSI does not work as expected

Question

It is a jobPosts schema that has a posted_date as one of the attributes. The goal is to query all the job posts between two dates. Here is the schema for your reference:

{
  'job_id': {S: jobInfo.job_id},
  'company': {S: jobInfo.company},
  'title': {S: jobInfo.title},
  'posted_on': {S: jobInfo.posted_on},
}

posted_on' is based on ISO string (2019-11-10T10:52:38.013Z). job_id is the primary key (partition key) and since I need to query the dates, I created GSI(partition key) on posted_on. Now here is the query:

  const params = {
    TableName : "jobPosts",
    IndexName: 'date_for_filter_purpose-index',
    ProjectionExpression:"job_id, company, title, posted_on",
    KeyConditionExpression: "posted_on BETWEEN :startDate AND :endDate",
    ExpressionAttributeValues: {
        ":startDate": {S: "2019-10-10T10:52:38.013Z"},
        ":endDate": {S: "2019-11-10T10:52:38.013Z"}
    }
  };

I have one document in dynamoDB and here it is:

{
  job_id:,
  company: "xyz",
  title: "abc",
  posted_on: "2019-11-01T10:52:38.013Z"
}

Now, on executing this, I get the following error:

{
    "message": "Query key condition not supported",
    "code": "ValidationException",
    "time": "2019-11-11T06:15:37.231Z",
    "requestId": "J078NON3L8KSJE5E8I3IP9N0IBVV4KQNSO5AEMVJF66Q9ASUAAJG",
    "statusCode": 400,
    "retryable": false,
    "retryDelay": 12.382362030893768
}

I don't know what is wrong with the above query.

Update after Tommy Answer: I removed the GSI on posted_on and re-created the table with job_id as partition key and posted_on as sort key. I get the following error:

{
    "message": "Query condition missed key schema element: job_id",
    "code": "ValidationException",
    "time": "2019-11-12T11:01:48.682Z",
    "requestId": "M9E793UQNJHPN5ULQFJI2NR0BVVV4KQNSO5AEMVJF66Q9ASUAAJG",
    "statusCode": 400,
    "retryable": false,
    "retryDelay": 42.52613025785952
}

As per this SO answer, GSI should be able to query the dates using BETWEEN keyword.

jarmod · Accepted Answer · 2019-11-13T00:39:45.200

The answer you refer to relates to a query where the partition key has a specific value and the sort key is in a given range. It's analagous to select * from table where status=Z and date between X and Y. That's not what you're trying to do, if I read your question correctly. You want select * from table where date between X and Y. You cannot do this with DynamoDB query - you cannot query a partition key by range.

If you knew that your max range of query dates was on a given day then you could create a GSI with a partition key set to the computed YYYYMMDD value of the date/time and whose sort key was the full date/time. Then you could query with a key condition expression for a partition key of the computed YYYYMMDD and a sort key between X and Y. For this to work, the YYYYMMDD of X and Y would have to be the same.

If you knew that your max range of query dates was a month then you could create a GSI with partition key set to the computed YYYYMM of the date/time and whose sort key was the full date/time. For this to work, the YYYYMM of X and Y would have to be the same.

That would mean that you need to create a GSI each time you want to query for the range of items from a particular timeframe. Although possible, I don't think that creation of the GSI on the fly to run a query against it would be practical, especially given the time required to build a GSI. I guess you could run a monthly task to create a new GSI for each month. But then from the cost perspective (if not anything else), it would be sub-optimal. Have you seen this technique used anywhere? — Tommy, Nov 12 '19 at 23:59
I wouldn't create the GSI on the fly, I'd have it as a maintained index, projecting the attributes needed to answer this query. A few resources on DynamoDB and date ranges: https://www.edureka.co/community/938/querying-dynamodb-by-date, https://www.dynamodbguide.com/querying/, https://stackoverflow.com/a/48094172/271415, https://stackoverflow.com/a/36145481/271415. — jarmod, Nov 13 '19 at 00:17

score 1 · Answer 2 · edited Jun 20 '20 at 09:12

1

I guess it's a little counter-intuitive but DynamoDB supports only .eq condition on partition key attributes.

As per KeyConditions Documentation

You must provide the index partition key name and value as an EQ condition. You can optionally provide a second condition, referring to the index sort key.

Furthermore, in Query API Documentation you can find the following

The condition must perform an equality test on a single partition key value.

The condition can optionally perform one of several comparison tests on a single sort key value. This allows Query to retrieve one item with a given partition key value and sort key value, or several items that have the same partition key value but different sort key values.

That explains the error message you are getting.

One of the solutions might be to create a composite primary key with posted_on attribute as the sort key, instead of the GSI. Then, depending on your use case and access pattern, you'll need to figure out which attribute would work best as the partition key.

This blog should help you to choose the right partition key for your schema.

edited Jun 20 '20 at 09:12

Community

1
1

answered Nov 11 '19 at 12:02

Tommy

1,006
3
13
26

Thank you @tommy for the answer. So you mean, after making the posted_on as sort key, you should use job_id still in query? If that is so, that is not my use case :( I have updated the answer. – Lakshman Diwaakar Nov 12 '19 at 11:14
Yes, you would need to use the entire primary key in your query, both the hash key and the sort key. Please note that in the example you linked in the updated question the GSI is also using composite primary key (`status` as a partition key and `createdDateTime` as a sort key) and they both are used in the query. Of course, you can create a GSI with a composite primary key, for example: `company` as a partition key and `posted_on` as a sort key. And keep `job_id` as a primary key in your original table. All depends on your access pattern and use case. – Tommy Nov 12 '19 at 11:45
1

Hey Tommy, I think you are right. Now, I moved to scan, it works as expected, but it is inefficient. But I don't get the point. I want all the jobs between the start and end dates. How can I provide the job_id, the partition id? Is my data modeling wrong? – Lakshman Diwaakar Nov 12 '19 at 13:53
Whenever a simple primary key or composite primary key, you always have to provide a specific partition key. If you want to use `query` without going to a specific item (identified by a partition key) you have to use sort key for that purpose, but still, need to provide specific partition key. ("only `.eq` condition supported" part from my answer). Otherwise, you are correct that you can use `scan` with all its drawbacks. Check out this blog, perhaps it will give you some ideas to choose your primary key. https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/ – Tommy Nov 12 '19 at 14:23
@LakshmanDiwaakar In addition to my previous comment, I have also updated the answer. – Tommy Nov 12 '19 at 20:48

DynamoDB Between query on GSI does not work as expected

2 Answers2