0

I have ~250,000 recurring tasks each day; about a fifth of which might be updated with different scheduled datetimes each day.

Can this be done efficiently in Celery? - I am worried about this from celery's beat.py:

    def tick(self):
        """Run a tick, that is one iteration of the scheduler.

           Executes all due tasks.

       """
        remaining_times = []
        try:
            for entry in values(self.schedule):
                next_time_to_run = self.maybe_due(entry, self.publisher)
                if next_time_to_run:
                    remaining_times.append(next_time_to_run)
        except RuntimeError:
            pass

        return min(remaining_times + [self.max_interval])
A T
  • 13,008
  • 21
  • 97
  • 158
  • That's about 3 things a second, although I assume these things will be clustered less nicely. So: try it with very very lightweight tasks and run top. 250k isn't an enormous number. – U2EF1 Jan 22 '14 at 05:05
  • We may be able to rewrite this to use the same algorithm as the timer which uses a heapq. Probably at the cost of space and a more expensive update for runtime schedules, but I think it's possible. Something like: at startup generate the heapq by calling is_due for all entries. when a entry is popped off the heapq and ran the heap is again updated with the next `.is_due()` value. If the schedule is changed (e.g. if an entry is changed when using the database backend scheduler) then the heap must be regenerated from scratch. – asksol Jan 23 '14 at 14:32
  • Yeah, that would indeed be a much better approach :) – A T Jan 24 '14 at 04:48

0 Answers0