0

I have pretty unique behavior I need to achieve with celery. I understand that it is not recommended to have tasks block at all, however I think it is necessary here as I describe below. Pseudocode:

Task 1:
Set event to false
Start group of task 2
Scrape website every few seconds to check for changes
If changes found, set event

Task 2:
Log into website with selenium.
Block until event from Task 1 is set
Perform website action with selenium

I would want task2 to be executed multiple times in parallel for multiple users. Therefore checking the website for updates in each instance of task2 would result in a large number of requests to the website which is not acceptable.

For a normal flow like this, I would to use task1 to start login tasks in a group and start another group after the condition has been met to execute the action tasks. However, the web action is time-sensitive and I don't want to re-open a new selenium instance (which would defeat the purpose of having this structure in the first place).

I've seen examples like this: Flask Celery task locking but using a Redis cache seems unnecessary for this application (and it does not need to be atomic because the 'lock' is only modified by task1). I've also looked into Celery's remote control but I'm not sure if there is the capability to block until a signal is received.

There is a similar question here which was solved by splitting the task I want to block into 2 separate tasks, but again I can't do this.

Jeremy
  • 661
  • 7
  • 19
  • 1
    I'm torn on answering this. It smells like a "wait until on-line tickets are available, then grab as many as possible" problem. Can you say a bit more about what you're trying to accomplish? – Dave W. Smith Aug 03 '20 at 02:41
  • @DaveW.Smith It's not for any application to scalp tickets, but to get reservations for my university gym. It's intended be a scheduling assistant (as every appointment becomes available 48 hours in advance), but it wouldn't be very good if the schedule you created couldn't be acquired! Thanks. – Jeremy Aug 03 '20 at 02:50

1 Answers1

1

Celery tasks can themselves enqueue tasks, so it's possible to wait for an event like "it's 9am", and then spawn off a bunch of parallel tasks. If you need to launch an additional task on the completion of a group of parallel tasks (i.e., if you need a fan-in task at the completion of all fan-out tasks), the mechanism you want is chords.

Dave W. Smith
  • 24,318
  • 4
  • 40
  • 46
  • Thanks Dave! Unfortunately, I've already considered this option. The goal with blocking here is to have each task already running its own selenium browser and have the user logged in **before** the event. Your solution would allow me to execute this flow, but does not solve my desire to already have selenium already be completely set up and ready to go for each task. This would slow down this process considerably, hence why I did not just use a group in the first place. – Jeremy Aug 03 '20 at 04:03
  • I think I see what you're trying to do, and celery is the wrong tool. What might work is a bunch of threads, each running selenium/webdriver and 'logged in', blocking on some signal to proceed, with an additional thread providing that signal. – Dave W. Smith Aug 03 '20 at 04:05
  • Ok thank you, the one thing I saw that made it seem possible with celery was the usage of the custom context manager as seen in: [Flask Celery task locking](https://stackoverflow.com/questions/53950548/flask-celery-task-locking). I figured creating something like that which would be able to use blocking on reading the cache value instead of setting and deleting it would be perfect. The reason I chose celery for this is because running selenium is memory and cpu demanding, and scalability is therefore very important as a few users could outgrow a single machine with the multithreaded approach. – Jeremy Aug 03 '20 at 04:24