1

I want to use the Python module Selenium to do web-scraping through a jupyter notebook. The jupyter notebook runs in a docker-container without any web-browser. I want to be able to distribute the notebook so that the web-scraping can be duplicated by other users. The notebook runs on a common jupyter lab container, and it is not possible to update the container to include a browser.

I have tried a number of things:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

And this:

!pip install chromedriver-binary
from selenium import webdriver
import chromedriver_binary  # Adds chromedriver binary to path

driver = webdriver.Chrome('/opt/conda/lib/python3.7/site-packages/chromedriver_binary')

For this last case I have located the binaries using

import chromedriver_binary
print(chromedriver_binary.__file__)

But unfortunately I have not been able to make any of it work.

emil banning
  • 475
  • 1
  • 4
  • 8
  • Which OS is used in docker container? This answer shows how to install selenium webdriver for google colab running on ubuntu: https://stackoverflow.com/questions/51046454/how-can-we-use-selenium-webdriver-in-colab-research-google-com/54077842#54077842 – Alexandra Dudkina Sep 18 '20 at 11:35

1 Answers1

1

the chrome driver depends on a local install of chrome - so you'll have to modify the docker image you're using to install chrome first.

lscoughlin
  • 2,327
  • 16
  • 23
  • You are technically correct, but I'm using a containerized instance of jupyter lab, where I cannot modify the docker image. So I'm hoping that I can find a work-around and install the browser afterwards – emil banning Sep 15 '20 at 12:08
  • A bit late to the party but I had the same issue, so I created a jupyter stack with scraper tools and added to the community stacks https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#community-stacks – GriffoGoes Apr 01 '22 at 01:49