5

Example scenario: I have two APIs (Account API and API exposed to user) where Account API will provide me a list of accounts (approx min size 500 and may increase when more accounts are added). I want to fetch the accounts based on a filter and expose the data in the other API to the user.

I am using Spring @scheduled to schedule the job which fetches the accounts API every 30mins.

In a single Pod, the scheduler will execute the job with ease.

When the app is replicated to three Pods in Kubernetes, all the schedulers will wake at the same time and it's a duplication of the scheduler job.

Expected Behaviour:

I want the Schedulers to work in a synchronized fashion, where if an account is being processed by one scheduler, another scheduler should work on the next account. Basically, I want the scheduler to work like how a multi-threaded program would work.

Something like scheduler1 processes 200 accounts, scheduler 2 processes 200 accounts, and scheduler3 processes 100 accounts.

I am new to Spring boot application development and like to know if something like the above can be done.

I read about Shedlock, but it will enable only one schedular to run at a time. But I like to use all the schedulers and process the accounts faster

vs922905
  • 55
  • 4
  • That is what your database can provide by writing a proper query and using row locks. Something like this https://vladmihalcea.com/database-job-queue-skip-locked/ – M. Deinum Dec 16 '21 at 07:37
  • @M.Deinum Actually, I fetch the data directly from a third party API and we are not retrieving from database – vs922905 Dec 16 '21 at 07:41
  • Then there is little you can do. The only thing I can think of is using Spring Batch, which would run on 1 node which then uses the other nodes to distribute the workload using a partitioned step. – M. Deinum Dec 16 '21 at 07:45
  • @M.Deinum Thanks for your advice. I will look into how Spring Batch works. – vs922905 Dec 16 '21 at 07:48
  • Maybe you should use a message broker for that (for example, Kafka or Rabbit). You can create a job (using Shedlock, Quartz, or something like that) that will create tasks for fetching data from the Account API sending these tasks to the queue. The same application (that consists of several instances - pods) should read messages from this queue and handle them. So, your processing will be balanced across instances and resilient. – grolegor Dec 16 '21 at 07:51

0 Answers0