2

I would like to run jobs, but as they may be long, I would like to know how far they have been processed during their execution. That is, the executor would regularly return its progress, without ending the job it is executing. I have tried to do this with APScheduler, but it seems the scheduler can only receive event messages like EVENT_JOB_EXECUTED or EVENT_JOB_ERROR.

Is it possible to get information from an executor while it is executing a job?

Thanks in advance!

petibonum
  • 83
  • 8
  • If there is a big loop, you can iterate a number that represents the progress of your work and print it. Or if you want something visual you can use this word to be the length of a canvas that will act as a progress bar of your work. I don't know if you can do it with a module instead of creating the canvas (or the print), but this way works without adding much time to your computation. – ysearka Jun 25 '15 at 17:49
  • @ysearka : How can the executor and the scheduler both access the same number? The executor usually locks every object it is allowed to modify. Do I have to create an object for each job which can be updated by the executor and read by the scheduler, and then ask the scheduler to check that number regularly, or is there a way to tell the scheduler each time the number has been updated? – petibonum Jun 26 '15 at 07:38
  • It may help if you show a minimal working demonstrative code. [ask] – boardrider Jun 26 '15 at 09:05
  • Pass a function name as one of your job's arguments, this can be used as the callback function and pass it the progress/status. Untested, but that's my educated guess. – Pieter Jul 23 '15 at 13:07

1 Answers1

0

There is, I think, no particular support for this within APScheduler. This requirement has come up for me many times, and the best solution will depend on exactly what you need. Some possibilities:

Job status dictionary

The simplest solution would be to use a plain python dictionary. Make the key the job's key, and the value whatever status information you require. This solution works best if you only have one copy of each job running concurrently (max_instances=1), of course. If you need some structure to your status information, I'm a fan of namedtuples for this. Then, you either keep the dictionary as an evil global variable or pass it into each job function.

There are some drawbacks, though. The status information will stay in the dictionary forever, unless you delete it. If you delete it at the end of the job, you don't get to read a 'job complete' status, and otherwise you have to make sure that whatever is monitoring the status definitely checks and clears every job. This of course isn't a big deal if you have a reasonable sized set of jobs/keys.

Custom dict

If you need some extra functions, you can do as above, but subclass dict (or UserDict or MutableMapping, depending on what you want).

Memcached

If you've got a memcached server you can use, storing the status reports in memcached works great, since they can expire automatically and they should be globally accessible to your application. One probably-minor drawback is that the status information could be evicted from the memcached server if it runs out of memory, so you can't guarantee that the information will be available.

A more major drawback is that this does require you to have a memcached server available. If you might or might not have one available, you can use dogpile.cache and choose the backend that's appropriate at the time.

Something else

Pieter's comment about using a callback function is worth taking note of. If you know what kind of status information you'll need, but you're not sure how you'll end up storing or using it, passing a wrapper to your jobs will make it easy to use a different backend later.

As always, though, be wary of over-engineering your solution. If all you want is a report that says "20/133 items processed", a simple dictionary is probably enough.

Community
  • 1
  • 1
Sopoforic
  • 130
  • 6