In the process of migrating an existing Node.js (Hapi.js) + RethinkDB from an OVH VPS (smallest vps) to AWS Lambda (node) + DynamoDB, I've recently come across a very huge performance issue.
The usage is rather simple, people use an online tool, and "stuff" gets saved in the DB, passing through a node.js server/lambda. That "stuff" takes some spaces, around 3kb non-gzipped (a complex object with lots of keys and children, hence why using a NOSQL solution makes sense)
There is no issue with the saving itself (for now...), not so many people use the tool and there isn't much simultaneous writing to do, which makes sense to use a Lambda instead of a 24/7 running VPS.
The real issue is when I want to download those results.
- Using Node+RethinkDB takes about 3sec to scan the whole table and generate a CSV file to download
- AWS Lambda + DynamoDB timeout after 30sec, even if I paginate the results to download only 1000 items, it still takes 20 sec (no timeout this time, just very slow) -> There are 2200 items on that table, and we could deduce that we'd need around 45sec to download the whole table, if AWS Lambda wouldn't timeout after 30sec
So, the operation takes around 3s with RethinkDB, and would theoretically take 45sec with DynamoDB, for the same amount of fetched data.
Let's look at those data now. There are 2200 items in the table, for a total of 5MB, here are the DynamoDB stats:
Provisioned read capacity units 29 (Auto Scaling Enabled)
Provisioned write capacity units 25 (Auto Scaling Enabled)
Last decrease time October 24, 2018 at 4:34:34 AM UTC+2
UTC: October 24, 2018 at 2:34:34 AM UTC
Local: October 24, 2018 at 4:34:34 AM UTC+2
Region (Ireland): October 24, 2018 at 2:34:34 AM UTC
Last increase time October 24, 2018 at 12:22:07 PM UTC+2
UTC: October 24, 2018 at 10:22:07 AM UTC
Local: October 24, 2018 at 12:22:07 PM UTC+2
Region (Ireland): October 24, 2018 at 10:22:07 AM UTC
Storage size (in bytes) 5.05 MB
Item count 2,195
There is 5 provisioned read/write capacity units, with an autoscaling max to 300. But the autoscaling doesn't seem to scale as I'd expect, went from 5 to 29, could use 300 which would be enough to download 5MB in 30 sec, but doesn't use them (I'm just getting started with autoscaling so I guess it's misconfigured?)
Here we can see the effect of the autoscaling, which does increase the amount of read capacity units, but it does so too late and the timeout has happened already. I've tried to download the data several times in a row and didn't really see much improvements, even with 29 units.
The Lambda itself is configured with 128MB RAM, increasing to 1024MB has no effect (as I'd expect, it confirms the issue comes from DynamoDB scan duration)
So, all this makes me wonder why DynamoDB can't do in 30sec what does RethinkDB in 3sec, it's not related to any kind of indexing since the operation is a "scan", therefore must go through all items in the DB in any order.
I wonder how am I supposed to fetch that HUGE dataset (5MB!) with DynamoDB to generate a CSV.
And I really wonder if DynamoDB is the right tool for the job, I really wasn't expecting so low performances compared to what I've been using by the past (mongo, rethink, postgre, etc.)
I guess it all comes down to proper configuration (and there probably are many things to improve there), but even so, why is it such a pain to download a bunch of data? 5MB is not a big deal but there if feels like it requires a lot of efforts and attention, while it's just a common operation to export a single table (stats, dump for backup, etc.)
Edit: Since I created this question, I read https://hackernoon.com/the-problems-with-dynamodb-auto-scaling-and-how-it-might-be-improved-a92029c8c10b which explains in-depth the issue I've met. Basically, autoscaling is slow to trigger, which explains why it doesn't scale right with my use case. This article is a must-read if you want to understand how DynamoDB auto-scaling works.