0

I am building a web-scraping Python script based on Selenium, Beautifulsoup, and Chrome web driver. The code worked just fine, was able to scrape data from several pages of the desired website, but after running for some time, it threw an error that I could not fix:

File "lib.py", line 310, in <module>
    shop.scrapProductData(masterdata)    
  File "lib.py", line 103, in scrapeProductData
    Chrome = webdriver.Chrome()
  File "/home/philgun/.local/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
    desired_capabilities=desired_capabilities)
  File "/home/philgun/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "/home/philgun/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/home/philgun/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/home/philgun/.local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

I am not really sure why it crashed randomly. The logic that I implemented was:

  1. Open the website, scrape the URLs of the store I want to scrape the data from, then close the browser.
  2. Within a for loop, I open each store's URL.
  3. For each store I scrape the product URLs and close the browser after getting the URLs
  4. Using for loop again, I open each product URL and scrape the HTML data that I want, then close the browser.
  5. Dump it to a JSON file.

I make sure that the web driver is closed each time the HTML data from each URL has been scraped and turned into beautifulsoup object. Any comments or suggestions or discussions will be very appreciated. I don't know how to reproduce my code here so, here is the link to the code hosted in my personal GitHub instead:

https://github.com/philgun/coolstuff/blob/master/adena/tokopedia/lib.py

PS: Line 15-20 are added just now, based on Soheil Pourbafrani's comment on this thread:

https://stackoverflow.com/questions/50642308/webdriverexception-unknown-error-devtoolsactiveport-file-doesnt-exist-while-t

I haven't tested it though.

Thanks a lot! Cheers, PG

philgun
  • 149
  • 8

0 Answers0