I have 10 tasks which are selenium based scrapers which use celery to schedule and run with Redis as a broker.
Selenium uses geckodriver to connect and use Firefox (65.0.1, headless mode) to scrape data.
The problem I am facing is that once a celery worker executes a task that spawns FireFox processes to scrape and even though I use driver.quit()
at the end of every task, there are Firefox and geckodriver processes that continue to persist.
The above is the crux of my issue. It ends up eating most of my RAM and eventually leads to not enough memory being available for other scraper tasks to run.
Irrespective of the amount of RAM. The selenium task shouldn't be leaving zombie processes even though driver.quit()
is called.
Any suggestions to resolve this would be great.