We have some python scripts that scrape websites and work well. Now we want to do this in Azure Databricks. We thought we had the solution to do this with the following post in the Databricks forum, but unfortunately, it doesn't work. (https://forums.databricks.com/questions/15480/how-to-add-webdriver-for-selenium-in-databricks.html?childToView=21347#answer-21347)
The error we get after running the last bit of code is : WebDriverException: Message: unknown error: cannot find Chrome binary (Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Linux 4.15.0-1050-azure x86_64)
The last bit of code looks like this:
%py
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_driver = "/tmp/chromedriver/chromedriver"
driver = webdriver.Chrome(chrome_driver,
chrome_options=chrome_options)
driver.get("https://www.google.com")
I have found a post where I have to give the location of the binary: Selenium gives "selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary" on Mac
options.binary_location = "/Applications/Google
Chrome.app/Contents/MacOS/Google Chrome"
But I don't know the file location in Azure Databricks for this binary.