1

I've written a complex webscraper in a notebook that takes a long time to run. I have started using databricks, and i want to run this script in my databricks cluster, so that I can run the scraper without relying on my local server.

However, I'm not able to set up the environment correctly. There are many stack overflows, but I haven't been able to figure it out yet.

I'm configured this way:

%pip install selenium
%pip install chromedriver
%pip install webdriver_manager
%pip install beautifulsoup4


import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.core.utils import ChromeType

import time
from bs4 import BeautifulSoup
import pickle as pkl



service=Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)

Depending on the code i'm trying, I get the errors:

"databricks" "WebDriverException: Message: unknown error: cannot find Chrome binary"

WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/chromium-browser is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Stacktrace:
#0 0x55c30eb24d93......

I'm currently getting the second error. If anyone can tell me how to configure my environment so that my code runs, I would really appreciate it!

I've tried a few things from stack overflows:

  1. script with updated chromedriver link: cannot get selenium webdriver to work in azure databricks
  2. Started this one, but confused. Not sure I need all of it: How to use Selenium in Databricks and accessing and moving downloaded files to mounted storage and keep Chrome and ChromeDriver versions in sync?
  3. Similar, but no answer: Using Selenium within Databricks (chrome not reachable)
user11781
  • 21
  • 4

0 Answers0