3

We have some python scripts that scrape websites and work well. Now we want to do this in Azure Databricks. We thought we had the solution to do this with the following post in the Databricks forum, but unfortunately, it doesn't work. (https://forums.databricks.com/questions/15480/how-to-add-webdriver-for-selenium-in-databricks.html?childToView=21347#answer-21347)

The error we get after running the last bit of code is : WebDriverException: Message: unknown error: cannot find Chrome binary (Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Linux 4.15.0-1050-azure x86_64)

The last bit of code looks like this:

    %py

    from selenium import webdriver

    chrome_options = webdriver.ChromeOptions()

    chrome_options.add_argument('--no-sandbox')

    chrome_options.add_argument('--headless')

    chrome_options.add_argument('--disable-dev-shm-usage')

    chrome_driver = "/tmp/chromedriver/chromedriver"

    driver = webdriver.Chrome(chrome_driver, 
    chrome_options=chrome_options)

    driver.get("https://www.google.com")

I have found a post where I have to give the location of the binary: Selenium gives "selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary" on Mac

    options.binary_location = "/Applications/Google 
    Chrome.app/Contents/MacOS/Google Chrome"

But I don't know the file location in Azure Databricks for this binary.

Chandan
  • 571
  • 4
  • 21
jbazelmans
  • 283
  • 1
  • 6
  • 16
  • 1
    Even i am also getting below issue "WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home" – Mohan Parashetti Apr 21 '21 at 11:44

1 Answers1

3

Well, I've got it to work after a small change to the original scipt

    %sh /databricks/python3/bin/pip3 install selenium
    ==================
    %sh
    wget      
    https://chromedriver.storage.googleapis.com/73.0.3683.68/chromedriver_linux64.zip 
    -O /tmp/chromedriver_linux64.zip
    ==================
    %sh mkdir /tmp/chromedriver
    ================
    %sh
    unzip /tmp/chromedriver_linux64.zip -d /tmp/chromedriver/
    ==================
    %sh
    sudo add-apt-repository ppa:canonical-chromium-builds/stage
    ===================
    %sh
    /usr/bin/yes | sudo apt update
    ===================
    %sh
    /usr/bin/yes | sudo apt install chromium-browser
    ===================
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_driver = "/tmp/chromedriver/chromedriver"
    driver = webdriver.Chrome(chrome_driver,chrome_options=chrome_options)
    driver.get("https://www.google.com")

This script downloaded and updated chromium to version 77. While the chromedriver was 73. Changing rhe link to download the chromedriver 77 did the trick.

    wget 
    https://chromedriver.storage.googleapis.com/77.0.3865.40/chromedriver_linux64.zip 
jbazelmans
  • 283
  • 1
  • 6
  • 16
  • 1
    How did you add Chromedriver to PATH in Databricks? I'm getting the error `WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home` – jeppoo1 Oct 02 '20 at 11:32
  • @jeppoo1 I am running into this same error with script. Did you ever figure it out? – Andrew Hicks Nov 08 '21 at 20:27
  • @AndrewHicks sorry, I did not figure it out, we had to find another way to complete the task – jeppoo1 Nov 10 '21 at 08:42
  • 1
    @AndrewHicks see my Q&A here for complete solution: https://stackoverflow.com/questions/67830079/. I answer how to set up Selenium, Chrome, and ChromeDriver and keep the versions synced. – kindofhungry Nov 19 '21 at 07:12