1

I have an API which allows other microservices to call on to check whether a particular product exists in the inventory. The API takes in only one parameter which is the ID of the product.

The API is served through API Gateway in Lambda and it simply queries against a Postgres RDS to check for the product ID. If it finds the product, it returns the information about the product in the response. If it doesn't, it just returns an empty response. The SQL is basically this:

SELECT * FROM inventory where expired = false and product_id = request.productId;

However, the problem is that many services are calling this particular API very heavily to check the existence of products. Not only that, the calls often come in bursts. I assume those services loop through a list of product IDs and check for their existence individually, hence the burst.

The number of concurrent calls on the API has resulted in it making many queries to the database. The rate can burst beyond 30 queries per sec and there can be a few hundred thousands of requests to fulfil. The queries are mostly the same, except for the product ID in the where clause. The column has been indexed and it takes an average of only 5-8ms to complete. Still, the connection to the database occasionally time out when the rate gets too high.

I'm using Sequelize as my ORM and the error I get when it time out is SequelizeConnectionAcquireTimeoutError. There is a good chance that the burst rate was too high and it max'ed out the pool too.

Some options I have considered:

  • Using a cache layer. But I have noticed that, most of the time, 90% of the product IDs in the requests are not repeated. This would mean that 90% of the time, it would be a cache miss and it will still query against the database.
  • Auto scale up the database. But because the calls are bursty and I don't know when they may come, the autoscaling won't complete in time to avoid the time out. Moreover, the query is a very simple select statement and the CPU of the RDS instance hardly crosses 80% during the bursts. So I doubt scaling it would do much too.

What other techniques can I do to avoid the database from being hit hard when the API is getting burst calls which are mostly unique and difficult to cache?

Carven
  • 14,988
  • 29
  • 118
  • 161
  • Start by optimizing your database access if you haven't done so already. If you can, change `SELECT *` to `SELECT col,col,col` to avoid fetching data you don't need. Then add a so-called *compound covering index.* – O. Jones Jun 01 '20 at 18:07
  • How many records do you have in your `inventory` table and are they dynamic or static? – Saeed Jun 01 '20 at 18:24
  • What is exactly the timeout message ? Is there any message (warning or error) in PostgreSQL log ? Could you post some `EXPLAIN ANALYZE` output for the SQL query ? – pifor Jun 01 '20 at 18:26
  • @O.Jones In my actual code, the columns are selected individually and not *. I will check out the covering index. Thanks! – Carven Jun 01 '20 at 18:26
  • @Saeed There are about 500k records in the table. Some of the columns in the table are dynamic in the sense that their values can change in certain actions were applied to them or due to some cron job. For example, to indicate whether they are expired or not, the column “expired” will be set as true after a period of time. – Carven Jun 01 '20 at 18:29
  • @pifor I’m using Sequelize as my ORM and the error message is SequelizeConnectionAcquireTimeoutError. So there is actually a good chance that the burst rate was too high and my database pool ran out of connections, hence the timeout. – Carven Jun 01 '20 at 18:33
  • Right. The trouble with lambda is this: the lambda system just starts more instances when it's under heavy load. Each of those instances has its own connection pool. You *could* try configuring a very small connection pool (one or two connections) in each lambda instance. https://stackoverflow.com/questions/35525574/how-to-use-database-connections-pool-in-sequelize-js – O. Jones Jun 02 '20 at 15:57

2 Answers2

0

Start with a covering index to handle your query. You might create an index like this for your table:

CREATE INDEX inv_lkup ON inventory (product_id, expired) INCLUDE (col, col, col);

Mention all the columns in your SELECT in the index, either in the main list of indexed columns or in the INCLUDE clause. Then the DBMS can satisfy your query completely from the index. It's faster.

You could start using AWS lambda throttling to handle this problem. But, for that to work the consumers of your API will need to retry when they get 429 responses. That might be super-inconvenient.

Sorry to say, you may need to stop using lambda. Ordinary web servers have good stuff in them to manage burst workload.

They have an incoming connection (TCP/IP listen) queue. Each new request coming in lands in that queue, where it waits until the server software accept the connection. When the server is busy requests wait in that queue. When there's a high load the requests wait for a bit longer in that queue. In nodejs's case, if you use clustering there's just one of these incoming connection queues, and all the processes in the cluster use it.

The server software you run (to handle your API) has a pool of connections to your DBMS. That pool has a maximum number of connections it it. As your server software handles each request, it awaits a connection from the pool. If no connection is immediately available the request-handling pauses until one is available, then handles it. This too smooths out the requests to the DBMS. (Be aware that each process in a nodejs cluster has its own pool.)

Paradoxically, a smaller DBMS connection pool can improve overall performance, by avoiding too many concurrent SELECTs (or other queries) on the DBMS.

This kind of server configuration can be scaled out: a load balancer will do. So will a server with more cores and more nodejs cluster processes. An elastic load balancer can also add new server VMs when necessary.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
0

Use cache in the boot time

You can load all necessary columns into an in-memory data storage (redis). Every update in database (cron job) will affect cached data.

Problems: memory overhead of updating cache

Limit db calls

Create a buffer for ids. Store n ids and then make one query for all of them. Or empty the buffer every m seconds!

Problems: client response time extra process for query result

Change your database

Use NoSql database for these data. According to this article and this one, I think choosing NoSql database is a better idea.

Problems: multiple data stores

Saeed
  • 5,413
  • 3
  • 26
  • 40