1

I built a crawler that gets the product information of a list of products entered by the user. Sometimes, the crawler freezes, especially if the list of products is long and if the crawler runs on headless mode.

The bug seems random and is not reproducible, which makes me think it is caused by the resource utilization of the website being crawled.

Since this is a non-reproducible bug, I don't think I can fix it, but is there a way to detect that the crawler has frozen and try again?

Here is some information about the crawler and the bug:

  • The crawler is built using Selenium and Python.

  • The bug occurs with different websites and products.

  • The bug occurs in the "normal" mode, but occurs more often in the headless mode.

Thanks!

Felipe

Felipe
  • 15
  • 2
  • Did you tried this solution [Selenium determine browser is frozen](https://stackoverflow.com/questions/14528001/selenium-determine-browser-is-frozen)? – Kafels May 21 '19 at 23:48
  • Without seeing your code we would be guessing but, Headless has less overhead than normal which means you're probably running into a race condition with the site. You might try `driver.set_script_timeout()`? – Marcel Wilson May 22 '19 at 05:20

1 Answers1

0

If the problem isn't related to the browser, it is because the code is busy on getting data in headless mode. If your code working in the normal mode instead of headless mode, you see only the working part.

I assume you made a GUI. If it is so, you are trying to access GUI but the same program working on crawling. That's why GUI is freezing.

You can solve this by using the Threading library or any other multiprocessing method. This will allow you to run more than one process at the same time. So, you can freely use other functions on the GUI and crawl a website without freezing.

Swaroop Humane
  • 1,770
  • 1
  • 7
  • 17
MetinOnt
  • 43
  • 1
  • 4