18

I've seen the page on amazon and understand that 1 RCU is a 4KB item.

If I have a table with 50 items, I've read that a scan will read the full 50 items and use 50 RCU. But lets say I did a query, my table is 10 by 5, will it still use 50 RCU?

Mike Dinescu
  • 54,171
  • 16
  • 118
  • 151
zuba
  • 365
  • 1
  • 4
  • 14
  • 1
    Query will only consume the count of items there are returned (assuming there is no filter, which will be filtered after the reading, and total size is less than 1mb) – Can Sahin May 04 '18 at 16:28

3 Answers3

49

Scanning a table that contains 50 items will consume 50 RCU only if the total size of the 50 items combined equal 200KB (for a strongly consistent read, or 400KB for an eventual consistent read). Most items are not that big, so a 50 items typically only require about 10KB to store meaning a full scan for a table of 50 items, with eventual consistency, would only cost about 3 RCU.

The consumed Read Capacity Units (RCU) depends on multiple factors:

If an item is read using a GetItem operation than the consumed capacity is billed in increments of 4KB, based on the size of the item (ie. a 200B item and a 3KB item would each consume 1RCU, while a 5KB item would consume 2 RCU)

If you read multiple items using a Query or Scan operation, then the capacity consumed depends on the cumulative size of items being accessed (you get billed even for items filtered out of a query or scan when using filters). So, if your query or scan accesses 10 items, that are approximately 200 bytes each in size, then it will consume only 1 RCU. If you read 10 items but each item is about 5KB in size, then the total consumed capacity will be 13 RCU (50KB / 4KB = 12.5, rounded up, is 13)

What's more, if you perform an eventual consistent read, then you can double the size per capacity unit. So it would only cost 7 RCU to read the 10 5KB items.

You can read more about throughput capacity here.

A couple of things to note:

  • a single item may be as large as 400KB, so reading an item could consume as much as 100 RCU.
  • when calculating item size, attribute names count towards the item size as well, not just their values!
Mike Dinescu
  • 54,171
  • 16
  • 118
  • 151
  • 2
    Useful summary. However, its unclear to me what "accessed" mean. If I query based on the Hash Key, would my query access only items with that key? How about sort key? – nagy.zsolt.hun Jan 13 '19 at 00:04
  • Correct. A query will only access items of a particular hash key – Mike Dinescu Jan 13 '19 at 00:06
  • Thanks. If I also set constraint on the sort key, would all items of the HashKey be accessed, or only the ones matching the constraint on the sort key as well? – nagy.zsolt.hun Jan 13 '19 at 00:09
  • Not sure what you mean. A query **requires** a hash key. It is that hash key that gets accessed in that query. – Mike Dinescu Jan 13 '19 at 01:49
  • 2
    I'm asking about composite keys (consisting of a hash key + a sort key): multiple Items may have the same hash key. When running a query where I specify the hash key + a constraint on the sort key (e.g. a BETWEEN condition), which items get accessed? All items with the same Hash Key, or only the ones matching the constraint on the sort key? – nagy.zsolt.hun Jan 13 '19 at 09:55
  • You can verify this by asking to return the consumed capacity in the query response but ionly the items returned by the key constraint should be counted towards the consumed capacity – Mike Dinescu Jan 13 '19 at 17:19
  • @MikeDinescu if you performed 4 rapid queries in succession (as is often the case with geoqueries), are those 4 queries guaranteed to be calculated individually? Or might they be calculated twice, for example, if each query made it to DynamoDB within half a second? In other words, if the first and second query hit the API within 1 second, would the RCU calculation be on their combined item size and treated as one API call? – lurning too koad Jan 31 '19 at 17:17
  • This would be better asked as a separate question but the TL;DR is each query is a separate request therefore capacity utilization is billed per reqest – Mike Dinescu Jan 31 '19 at 17:30
  • @MikeDinescu Good idea https://stackoverflow.com/questions/54468374/calculating-dynamodb-rcu-pricing-per-day-not-per-second – lurning too koad Jan 31 '19 at 20:13
  • *Most items are not that big, so a 50 items typically only require about 10KB to store meaning a full scan for a table of 50 items, with eventual consistency, would only cost about 3 RCU.* **Is this really correct?** According to AWS' docs, *"One read request unit represents one strongly consistent read request, or two eventually consistent read requests, **for an item** up to 4 KB in size."* No where in docs does it say reads capacity is cumulative... – user1322092 Feb 13 '19 at 02:53
  • Capacity consumed is for each operation(request), based on the amount of data accessed, not per item. Meke sense?! – Mike Dinescu Feb 13 '19 at 04:23
  • @nagy.zsolt.hun To answer your question, all the items with the same hash key are accessed and then filter it applied on top it. Capacity consumption also is for all items accessed, not just the ones returned. – Vinay Nov 12 '19 at 13:11
  • 1
    This needs to be so much more clear in the documentation... Maybe the *pricing page* – danthegoodman Dec 06 '20 at 01:04
6

Query—Reads multiple items that have the same partition key value. All items returned are treated as a single read operation, where DynamoDB computes the total size of all items and then rounds up to the next 4 KB boundary. For example, suppose your query returns 10 items whose combined size is 40.8 KB. DynamoDB rounds the item size for the operation to 44 KB. If a query returns 1500 items of 64 bytes each, the cumulative size is 96 KB.

Ref: https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/ProvisionedThroughput.html

rajd
  • 71
  • 1
  • 2
3

Smoke tested this with following entries using composite primary key & provisioned capacity, and eventual consistency in place:

  • entry#1 (size ~ 200B): hash key = foo, range key = foobar

  • entry#2 (size ~ 5KB): hash key = foo, range key = foojar

Queries to the table & reported consumption of RCUs:

  1. hash key EQUALS "foo" AND range key BEGINS_WITH "foo" --> both entries returned and 1 consumed RCUs
  2. hash key EQUALS "foo" AND range key BEGINS_WITH "foobar" --> entry with size ~ 200B returned and 0.5 consumed RCUs
  3. hash key EQUALS "foo" AND range key BEGINS_WITH "fooojar" --> entry with size ~ 5KB returned and 1 consumed RCUs

As already being speculated, this would indicate, that the accessed items are those matching the whole composite key, not just the hash key.

Compared, if you just queried the items via hash key, and then filtered to down to single item --> it would access all items in the partition and still consume the 1 RCU.

L3p1
  • 96
  • 5
  • Point 3 would be 2 RCU since the size is >4KB – Punith Raj Sep 29 '22 at 06:52
  • Nope, the tests were performed using eventual consistency. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html – L3p1 Sep 29 '22 at 08:38