4

I'm using AWS SDK for Javascript (Node.js) to read data from a DynamoDB table. The auto scaling feature does a great job during most of the time and the consumed Read Capacity Units (RCU) are really low most part of the day. However, there's a programmed job that is executed around midnight which consumes about 10x the provisioned RCU and since the auto scaling takes some time to adjust the capacity, there are a lot of throttled read requests. Furthermore, I suspect my requests are not being completed (though I can't find any exceptions in my error log).

In order to handle this situation, I've considered increasing the provisioned RCU using the AWS API (updateTable) but calculating the number of RCU my application needs may not be straightforward.

So my second guess was to retry failed requests and simply wait for auto scale increase the provisioned RCU. As pointed out by AWS docs and some Stack Overflow answers (particularlly about ProvisionedThroughputExceededException):

The AWS SDKs for Amazon DynamoDB automatically retry requests that receive this exception. So, your request is eventually successful, unless the request is too large or your retry queue is too large to finish.

I've read similar questions (this one, this one and this one) but I'm still confused: is this exception raised if the request is too large or the retry queue is too large to finish (therefore after the automatic retries) or actually before the retries?

Most important: is that the exception I should be expecting in my context? (so I can catch it and retry until auto scale increases the RCU?)

1 Answers1

10

Yes.

Every time your application sends a request that exceeds your capacity you get ProvisionedThroughputExceededException message from Dynamo. However your SDK handles this for you and retries. The default Dynamo retry time starts at 50ms, the default number of retries is 10, and backoff is exponential by default.

This means you get retries at:

  • 50ms
  • 100ms
  • 200ms
  • 400ms
  • 800ms
  • 1.6s
  • 3.2s
  • 6.4s
  • 12.8s
  • 25.6s

If after the 10th retry your request has still not succeeded, the SDK passes the ProvisionedThroughputExceededException back to your application and you can handle it how you like.

You could handle it by increasing throughput provision but another option would be to change the default retry times when you create the Dynamo connection. For example

new AWS.DynamoDB({maxRetries: 13, retryDelayOptions: {base: 200}});

This would mean you retry 13 times, with an initial delay of 200ms. This would give your request a total of 819.2s to complete rather than 25.6s.

F_SO_K
  • 13,640
  • 5
  • 54
  • 83
  • So, what can actually be done to avoid this situation all together ? Should I step away from the "on-demand" provisioning ? Is that the problem ? – bvdb Aug 21 '19 at 22:13
  • When this question was asked, on-demand provisioning did not exist. Yes, in the majority of cases you will probably want to use on-demand provisioning (unless you have massive and fairly consistent usage). On-demand, for the most part, removes this problem entirely. It can still occur in theory if you have a dramatic spike in use beyond the capability of an on-demand table to scale up. – F_SO_K Aug 28 '19 at 11:17
  • I believe this may have been the case. I was iterating all records in a table, removing a field from those records. – bvdb Aug 28 '19 at 23:15
  • 1
    @F_SO_K can u show how to do this config update in python 3.7? – ABCD Sep 17 '19 at 07:31
  • As mentioned before, this will still happen with on-demand or provisioned capacity if the load increases heavily from one minute to the other. You have to implement your own "spin-up" logic to completely avoid that. – jimpic Nov 05 '21 at 09:14