Handle very large throughput with external database

Question

I'm looking to build a Java11 Spring boot application. The application is to handle very large throughput (will have peaks and low traffic)

The happy path of the application looks like this.

Conceptually its fairly straight forward. The steps roughly look like this

Accept Incoming POST request. DTO object at a save endpoint.
Application will then validate the DTO and return relevant Error message if it is invalid.
Convert to a database entity object
Save entity to a Postgres database.

The potential issue we have with this application is that It's going to do database saves per each request its alot of individual saves. The database connection pools can quickly run out the more connections that are made.

My alternative approach looks like this

Im looking to return a status 200 once the incoming DTO passes validation and is queued up in a memory queue.
There is no external blocking here and should the database go down - meaning the internal queue will give some redundancy.

So some questions / ideas

Does this look like a good approach, are there any pitfalls I should look out for?
Maybe you have solved a similar issue in a better / different way?
Could reactive streams help in anyway?
What internal Java libraries should I use for this? My thinking was to go with Java's LinkedList Queue<SomeDto> myQ = new LinkedList<SomeDto>(); ) for queueing internally?

Your entire premise is based upon your connection pool running out of connections. Did you challenge this premise or verified that this is actually happening? A database is usually able to handle multiple transactions per second, depending on the hardware you're using. Additionally, connection pools like Hikari will wait up to 30 seconds by default if there are no connections available. If you're not relying on long transactions, you should ask yourself if the added complexity is worth it. — g00glen00b, Jul 03 '19 at 06:43

score 3 · Accepted Answer · answered Jun 25 '19 at 14:20

What happens if the app fails with data in the internal queue ? Or if there is an overflow of save operations in memory ?

If you want to build something more robust, you may consider an event-log solution (based on Kafka for example) with consumers populating the database (Kafka would replace your internal queue).

However, it is difficult to really answer your question here since many other elements must be taken into consideration.

I would suggest you to read a book like Designing Data-Intensive Applications: it is definitively a valuable resource and it will help you to design a reliable solution based on your needs and your context.

score 1 · Answer 2 · answered Jul 03 '19 at 19:11

A much better solution would be to have a redundant database so that in the event that one of the systems goes down or is otherwise unavailable, you can continue to function with your second database.

Keeping the data to persist in memory is a solution I would advise against. You say that you are anticipating a relatively high peak. If your DB is unavailable during a high peak, I cannot believe that you would be able to queue all requests in memory for the necessary length of time. If they are in memory, then any kind of application server (or hardware problem that affects your application sever) would result in a complete loss of all of your queued requests. This means that your REST interface lied to its callers. You returned that you had successfully persisted the data, when you did not because both your DB and your application crashed.

You either need a redundant database or a persistant and external queueing system. If you opt for an external queueing system (which also can be redundant to prevent outages) then you could simply push all persist requests into the external queue. Then you only have one mechanism/workflow that you need to support.

score 0 · Answer 3 · answered Jun 25 '19 at 13:55

0

If you are making a rest call I dont think you can keep all the request in a same linked list. You can use RabbitMQ for queuing purpose. As soon as the validation is successful you can push the object to queue and return 200

answered Jun 25 '19 at 13:55

Pradyskumar

296
1
3
15

but doesnt that have the same issue with saving it directly to a database? i.e. it needs to wait for slower TCP traffic over the network? – Robbo_UK Jun 25 '19 at 14:04
Since you can not rely on the internal memory queue, it is better to go for rabbitMQ or any other messaging services. I am not sure, But I think publishing to rabbitMQ is faster than actually saving to database if it is a large object. – Pradyskumar Jun 25 '19 at 14:30

Handle very large throughput with external database

3 Answers3