0

We have a DynamoDB table:

resource "aws_dynamodb_table" "prospectmaterials_table" {
  name              = "drinks"
  hash_key          = "PK"
  billing_mode      = "PAY_PER_REQUEST"
  read_capacity     = 5
  write_capacity    = 5
  
  attribute {
    name = "PK"
    type = "S"
  }
}

It currently contains 36,000 records.

An example of the data it contains:

PK Name Description Price
Coke-Coke Cola-Classic beverage-1.00 Coke Cola Classic beverage 1.00
Pepsi-Pepsi Cola-Another beverage-1.00 Pepsi Cola Another beverage 1.00
Dr. Pepper-Dr. Pepper-Yet another beverage-2.00 Dr. Pepper Yet another beverage 2.00

We want to retrieve all ~1000 records with the word "beverage" in the Description field.

Via an API Gateway endpoint, we want to query the table to retrieve each record which contains "beverage". This query currently breaks with "Invalid operator used in KeyConditionExpression: contains":

{
    "TableName": "drinks",
    "ConsistentRead": true,
    "ExpressionAttributeValues": {
        ":m": {
            "S": "beverage"
        }
    },
    "KeyConditionExpression": "contains(PK,:m)"
}

How should I construct this query so that it performs quickly and returns all the records I require?

Matt W
  • 11,753
  • 25
  • 118
  • 215
  • DynamoDB is the wrong tool for the job, you cannot design your table in a way this will ever work properly. DynamoDB is a key-value store, you are performing a query that is fundamentally different. For every contains-Query you will end up scanning the entire database. To get your current request working you would probably actually need to perform a `Scan` instead of a `Query`. – luk2302 Dec 21 '21 at 08:35
  • The wrong tool because of the amount of data or the fact I want to query with a `contains` condition? – Matt W Dec 21 '21 at 08:40
  • Because of the query you want to run. A DynamoDB cannot properly answer a contains query. If you run a query that returns 5 elements and then filter that dataset down to 2 elements based on the contains then you are fine. But *you* currently have to scan all the data for each request. This might work for small amounts of data, but is smelly already, and will break for a lot of data in terms of performance / latency. – luk2302 Dec 21 '21 at 08:43
  • "Give me all orders by user123, that happened yesterday and contain an item with 'iPad' in their name" is easily answerable if you properly partition and sort on userid and order date, "Give me all orders that contain an item with 'iPad' in their name" is not. – luk2302 Dec 21 '21 at 08:46
  • Why can it not properly answer a contains query? If I were to separate the data into further partitions and then run the query, would that make more sense? If I understand your point, it seems like Dynamo is not best suited to typical NoSQL DB use cases. – Matt W Dec 21 '21 at 08:47
  • No matter how you partition the data a full-text search cannot be natively performed in dynamodb. How would you theoretically split the description to partition on?? Instead you maybe need something like https://stackoverflow.com/questions/44291240/fulltext-search-dynamodb – luk2302 Dec 21 '21 at 08:49
  • I would like to perform "give me all the drinks priced 1.00 and have beverage in their name". Is that reasonable? – Matt W Dec 21 '21 at 08:50
  • Probably not, no. You would need to partition (or add another index) on the price (which sounds very weird) and then it depends on how partitioned the data is, are there 10 items with a price of 1.0 or are half of all items priced that way? – luk2302 Dec 21 '21 at 08:53
  • I think that this source can help you https://www.scavasoft.com/how-to-filter-dynamodb-records-effectively-using-query-or-scan-and-implement-a-full-text-search/ – Dilyan Atanasov Dec 21 '21 at 09:46
  • @DilyanAtanasov that does not improve ***anything***. You are doing a full table scan there, the worst thing you can possibly do. And it does not even touch on the full-text search OP is after. – luk2302 Dec 21 '21 at 11:58

1 Answers1

0

The CONTAINS operation you are trying to use is not supported as an operation in a KeyConditionExpression with the Query API. The only KeyConditionExpressions available are EQ | LE | LT | GE | GT | BEGINS_WITH | BETWEEN (see docs). On top of that, all of those operations except for EQ are reserved for the Sort Key. With a query operation you must specify a single partition, i.e. by using the = operator on the Partition Key you are querying. This means that not only will you will have to restructure your keys in order to accomplish the access pattern of:

We want to retrieve all ~1000 records with the word "beverage" in the Description field.

You will also probably have to change the access pattern itself. Something more feasible with DynamoDB might read as:

We want to retrieve all ~1000 items with the type of beverage

This is because equality on a partition key is a prerequisite to every single query operation you perform on your base table.

If you can't change the way your table is structured, then DynamoDB is likely not right tool for the job. If you can though, there are certainly ways of evaluating and shaping data to work with NoSQL tables in general, and DynamoDB specifically.

The best approach would be to lay out all of your access patterns, consult best practices documentation provided by AWS (linked earlier), and design your base table around your main patterns, while leveraging secondary indexes to supplement secondary patterns if necessary.

bd_
  • 96
  • 1
  • 4