6

Specifically, when is it better to use one or the other? I am using BatchGetItem now and it seems pretty damn slow.

sometimesiwritecode
  • 2,993
  • 7
  • 31
  • 69

1 Answers1

20

In terms of efficiency for retrieving a single item, for which you know the partition key (and sort key if one is used in the table), GetItem is more efficient than querying or scanning. BatchGetItem is a convenient way of retrieving a bunch of items for which you know the partition/sort key and it's only more efficient in terms of network traffic savings.

However, if you only have partial information about an item then you can't use GetItem/BatchGetItem and you have to either Scan or Query for the item(s) that you care about. In such cases Query will be more efficient than Scanning since with a query you're already narrowing down the table space to a single partition key value. Filter Expressions don't really contribute all that much to the efficiency but they can save you some network traffic.

There is also the case when you need to retrieve a large number of items. If you need lots of items with the same partition key, then a query becomes more efficient than multiple GetItem (or BatchGetItem calls). Also, if you need to retrieve items making up a significant portion of your table, a Scan is the way to go.

Mike Dinescu
  • 54,171
  • 16
  • 118
  • 151
  • 1
    You say that "Filter Expressions don't really contribute all that much to the efficiency but they can save you some network traffic." I understand that a filtered result will return less data, how performant is the actual filtering though? What if, for example, multiple IN operators were performed on a string set as part of a filter expression? I can't find any info on the efficiency of such a pattern. Cheers – theSiberman Dec 25 '17 at 02:10
  • 1
    Absolutely true. In terms of network bandwidth you can save a lot by filtering on the server rather than on the client, assuming the filtering is actually substantial. And the effects are even more significant on slower connections. With DynamoDB you can retrieve up to 1MB of data per query so you can ave anywhere from a few kilobytes to a whole megabyte per query depending on the sparseness of the filtered resultset. I’m not sure what efficiency you are referring to with respect to multiple IN clauses. – Mike Dinescu Dec 25 '17 at 04:34
  • Since the IN operators are applied to the result before it is returned to the client then the overhead is negligible. – Mike Dinescu Dec 25 '17 at 04:35
  • Thanks Mike, that makes perfect sense from a bandwidth perspective. The efficiency I'm referring to with the multiple IN clauses is simply the computation time, and therefor response time. If it's iterating through a string set multiple times to find the values will this be noticeable? – theSiberman Dec 26 '17 at 21:47
  • I would say that since it would only be testing a few thousand items the actual overhead will be minimal. And presumably you would have to perform the computation either way, whether you do it server side or client side it would take time. – Mike Dinescu Dec 27 '17 at 01:35
  • I know this is super old but wondering if anyone could elaborate on "There is also the case when you need to retrieve a large number of items. If you need lots of items with the same partition key, then a query becomes more efficient than multiple GetItem (or BatchGetItem calls)." What constitutes a large number of items. If I have to return say 50 items (all with same partition key but different sort key), is it better (in terms of cost and performance) to do a batch get or query? I couldn't find any good information on this.. seems like query is cheaper but :shrug:? – hurlbz Apr 30 '21 at 14:37
  • It depends on how many items have that same partition key in total (meaning, are you looking to retrieve 50 out of 50-60 items, or 50 out of 500) and if it's a subset whether you can use sorting to your advantage. And then finally you need to factor the cost, especially if the items are smaller than 4KB each. Query becomes very appealing when you need to read lots of relatively small items because of the way you pay for capacity consumed in aggregate for a query – Mike Dinescu May 01 '21 at 06:16