1

I have 10 tasks which are selenium based scrapers which use celery to schedule and run with Redis as a broker.

Selenium uses geckodriver to connect and use Firefox (65.0.1, headless mode) to scrape data.

The problem I am facing is that once a celery worker executes a task that spawns FireFox processes to scrape and even though I use driver.quit() at the end of every task, there are Firefox and geckodriver processes that continue to persist.

The above is the crux of my issue. It ends up eating most of my RAM and eventually leads to not enough memory being available for other scraper tasks to run.

Irrespective of the amount of RAM. The selenium task shouldn't be leaving zombie processes even though driver.quit() is called.

Any suggestions to resolve this would be great.

rohit keshav
  • 305
  • 2
  • 16
  • What is your version of FireFox are you using? – Stemado Mar 20 '19 at 19:40
  • 1
    I am using 65.0.1 – rohit keshav Mar 20 '19 at 19:45
  • https://stackoverflow.com/a/45057141/9105725, recommends killing the process. Seems extreme, but if you have the latest Selenium and Firefox versions, the non-extreme method doesn't seem to be working for you. –  Mar 20 '19 at 19:47
  • Since these are multiple scrapers running concurrently. A killall firefox after a scraper runs would basically kill another ongoing scraper. Is there a way to direct and kill the specific spawned firefox for that scraper's driver? – rohit keshav Mar 20 '19 at 19:52

0 Answers0