3

I have an API which features a series of endpoints that all preform very long running jobs, as in jobs that may take up to 48 hours to complete.

Of course, I can’t keep the client waiting for 48 hours to return a response, so I am looking for the best solution to handle these cases.

I have an idea of what to do but I am unsure if it is a worthwhile solution or how things are done in production based app. Furthermore, I’d like to implement a way to cancel the jobs when they’re running if need be and to update/monitor the overall progress of the jobs.

My current setup works as follows:

  1. API receives the request to start long job

  2. An entry for the job info is stored in the DB with the job status set to PENDING

  3. That job info is placed in a Rabbit MQ message and sent off by the RabbitMQ Producer

  4. The job id is retuned back to the client that initiated the API call with a 202, accepted status

  5. A RabbitMQ Consumer receives message with the job info and calls the class that is responsible for Executing the long running job, the job status is updated to IN-PROGRESS

  6. Now the client can check on the status of the job by another endpoint that accepts the job id and returns the current status / info

I think this approach works, and it seems scalable, but I have a few concerns that maybe someone could shed some insight on or help me address:

A. I want to be able to kill the job from another exposed endpoint if need be, what is the best way of accomplishing such a thing? I was thinking that maybe I could also persist the Thread id with the job info in step 5 when the service updates the status to IN-PROGRESS and begins processing. Then when I hit the cancel job endpoint I could just give it the job id and then kill the associated thread. Is that a viable solution or is there a better way to handle it?

B. I would like to implement an update strategy that allows me to quantify the overall progress for the job, therefore instead of just seeing IN PROGRESS or PENDING from the front end I can see the percentage of the job that is complete. The front end will be a desktop app so eventually i’d like to use this info to support a progress bar. However, I’m concerned about performance because I will need to constantly be writing to the job table for every time the progress is incremented and also constantly reading from the table when the endpoint to check the status is being hit by the client (I’m thinking every 10 seconds or so) is there a better solution to handling this based on the given info?

If it makes any difference, only 1 job should be processing at a time ... this is for an admin portal so only a few people will have access to this feature and if a specific job type is already IN PROGRESS, than another of that type won’t be allowed until it is COMPLETE, CANCELLED, or FAILED

oznomal
  • 439
  • 2
  • 8
  • 22
  • If there is only one thread, storing its ID is unnecessary. You can just store *the* running task in some singleton bean and interrupt it. And if your job runs for 48 hours, updating/reading its progress every 10 seconds looks overkill. Updating/Reading it every half hour would get you 96 updates, so approximately 1% increase every half hour. You generally don't need a better precision than that for a progress bar. That said, updating/reading a column of a row every 10 seconds wouldn't have any significant impact. And you can also keep the progress in memory, too, along with your task. – JB Nizet Jul 07 '19 at 15:53
  • I don't really get the point of going through RabbitMQ just to start a thread on the same server. That seems overkill, too. And if the goal is tomake it run on any server of a cluster, then your plan to interrupt the task wouldn't work. – JB Nizet Jul 07 '19 at 15:55
  • I see what you’re saying and a lot of it makes sense. I think I need to re-evaluate how to cancel the task. At the moment I only believe I’ll need one instance of the micro-service running, but I’d like this to be done in a way that I could spin up another instance if need be and it still function properly. – oznomal Jul 07 '19 at 20:27
  • [the SO thread](https://stackoverflow.com/questions/33009721/long-running-rest-api-with-queues) may help to clarify similar design issue – Ham Sep 25 '22 at 10:37

1 Answers1

0

A probable solution could be when the API receives a request to start the job return the job-id of the newly created job.

Have another endpoint to track the status of the jobs and manage its life cycle. Something like

/Jobs/Status/<Job-ID> 

GET would return the current status and percentage completion. DEL could stop kill an existing running job.

You could poll the status of job as mentioned every min/10 mins.

asolanki
  • 1,333
  • 11
  • 18