68

I want to use Selenium Webdriver of Chrome in colab.research.google.com for fast processing. I was able to install Selenium using !pip install selenium but the webdriver of chrome needs a path to webdriverChrome.exe. How am I suppose to use it?

P.S.- colab.research.google.com is an online platform which provides GPU for fast computational problems related to deep learning. Please refrain from solutions such as webdriver.Chrome(path).

arihant jain
  • 115
  • 1
  • 8
john mich
  • 2,477
  • 3
  • 17
  • 32
  • I think I mentioned "colab.research.google.com". I know how webdriver works on a local machine. But as colab research google is an online platform which provides GPU for fast machine learning processing problems, I want to use webdrive on this above mentioned online platform. – john mich Jun 26 '18 at 16:52
  • A same problem is in this link: https://stackoverflow.com/questions/54327654/how-can-i-insert-path-environmental-variable-for-geckodriver-in-goggle-colab – Hamed Baziyad Jan 23 '19 at 13:42
  • seems like it was asked 7 days ago – john mich Jan 31 '19 at 18:33
  • @Dimanjan Hey, I have stopped trying this. Use-case was scrapped and so did not explored further. – john mich Jul 10 '19 at 09:24

10 Answers10

123

Recently Google collab was upgraded and since Ubuntu 20.04+ no longer distributes chromium-browser outside of a snap package, you can install a compatible version from the Debian buster repository:

%%shell
# Ubuntu no longer distributes chromium-browser outside of snap
#
# Proposed solution: https://askubuntu.com/questions/1204571/how-to-install-chromium-without-snap

# Add debian buster
cat > /etc/apt/sources.list.d/debian.list <<'EOF'
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main
EOF

# Add keys
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A

apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg
apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg
apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg

# Prefer debian repo for chromium* packages only
# Note the double-blank lines between entries
cat > /etc/apt/preferences.d/chromium.pref << 'EOF'
Package: *
Pin: release a=eoan
Pin-Priority: 500


Package: *
Pin: origin "deb.debian.org"
Pin-Priority: 300


Package: chromium*
Pin: origin "deb.debian.org"
Pin-Priority: 700
EOF

# Install chromium and chromium-driver
apt-get update
apt-get install chromium chromium-driver

# Install selenium
pip install selenium

Then you can run selenium like this:

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.headless = True
wd = webdriver.Chrome('chromedriver',options=chrome_options)
wd.get("https://www.webite-url.com")
Thomas
  • 1,823
  • 1
  • 8
  • 24
34

this one worked in colab

!pip install selenium
!apt-get update 
!apt install chromium-chromedriver

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
Shaina Raza
  • 1,474
  • 17
  • 12
22

I made my own library to make it easy.

!pip install kora -q
from kora.selenium import wd
wd.get("https://www.website.com")

PS: I forget how I searched and experimented until it worked. But I first wrote and shared it in this gist in Dec 2018.

korakot
  • 37,818
  • 16
  • 123
  • 144
11

Don't have enough repu to comment. :(

However @Thomas answer still works in 06.10.2021, but with just one simple change since right of the bat you'll get DeprecationWarning: use options instead of chrome_options

Working code below:

!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',options=options)
wd.get("https://stackoverflow.com/questions/51046454/how-can-we-use-selenium-webdriver-in-colab-research-google-com")
wd.title
IcebergBuilder
  • 111
  • 1
  • 2
3

to use selenium in GOOGLE COLAB do the next steps in the colab notebook

!pip install kora -q

HOW TO USE IT INSIDE COLAB :

from kora.selenium import wd
wd.get("enter any website here")

YOU CAN ALSO USE IT WITH Beautiful Soup

import bs4 as soup
wd.get("enter any website here")
html = soup.BeautifulSoup(wd.page_source)
Mohamed TOUATI
  • 372
  • 2
  • 4
2

Google collab is using Ubuntu 20.04 now and you can't install chromium-browser without snap. But you can install it by using .deb files for ubuntu 18.04 at security.ubuntu.com/ubuntu/pool/universe/c/chromium-browser/.

I made a python script for this purpose. It finds latest version of chromium-browser and chromedriver for 18.04 and installs it for your google colab which has Ubuntu 20.04.

Site's links have been updated regularly. You don't need debian repository and apt keys.

import os
import re
import subprocess
import requests

# The deb files we need to install
deb_files_startstwith = [
    "chromium-codecs-ffmpeg-extra_",
    "chromium-codecs-ffmpeg_",
    "chromium-browser_",
    "chromium-chromedriver_"
]

def get_latest_version() -> str:
    # A request to security.ubuntu.com for getting latest version of chromium-browser
    # e.g. "112.0.5615.49-0ubuntu0.18.04.1_amd64.deb"
    url = "http://security.ubuntu.com/ubuntu/pool/universe/c/chromium-browser/"
    r = requests.get(url)
    if r.status_code != 200:
        raise Exception("status_code code not 200!")
    text = r.text

    # Find latest version
    pattern = '<a\shref="chromium\-browser_([^"]+.ubuntu0\.18\.04\.1_amd64\.deb)'
    latest_version_search = re.search(pattern, text)
    if latest_version_search:
        latest_version = latest_version_search.group(1)
    else:
        raise Exception("Can not find latest version!")
    return latest_version

def download(latest_version: str, quiet: bool):
    deb_files = []
    for deb_file in deb_files_startstwith:
        deb_files.append(deb_file + latest_version)

    for deb_file in deb_files:
        url = f"http://security.ubuntu.com/ubuntu/pool/universe/c/chromium-browser/{deb_file}"

        # Download deb file
        if quiet:
            command = f"wget -q -O /content/{deb_file} {url}"
        else:
            command = f"wget -O /content/{deb_file} {url}"
        print(f"Downloading: {deb_file}")
        # os.system(command)
        !$command

        # Install deb file
        if quiet:
            command = f"apt-get install /content/{deb_file} >> apt.log"
        else:
            command = f"apt-get install /content/{deb_file}"
        print(f"Installing: {deb_file}\n")
        # os.system(command)
        !$command

        # Delete deb file from disk
        os.remove(f"/content/{deb_file}")

def check_chromium_installation():
    try:
        subprocess.call(["chromium-browser"])
        print("Chromium installation successfull.")
    except FileNotFoundError:
        print("Chromium Installation Failed!")

def install_selenium_package(quiet: bool):
    if quiet:
        !pip install selenium -qq >> pip.log
    else:
        !pip install selenium

def main(quiet: bool):
    # Get the latest version of chromium-browser for ubuntu 18.04
    latest_version = get_latest_version()
    # Download and install chromium-browser for ubuntu 20.04
    download(latest_version, quiet)
    # Check if installation succesfull
    check_chromium_installation()
    # Finally install selenium package
    install_selenium_package(quiet)

if __name__ == '__main__':
    quiet = True # verboseness of wget and apt
    main(quiet)

And try selenium

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
wd = webdriver.Chrome('chromedriver', options=chrome_options)
wd.get("https://www.google.com")
print(f"Page title: {wd.title}")
0

colab and selenium How can data be extracted from a whoscored.com?

#    https://www.whoscored.com

# install chromium, its driver, and selenium
!apt update
!apt install chromium-chromedriver
!pip install selenium
# set options to be headless, ..
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome(options=options)
wd.get("https://www.whoscored.com")
print(wd.page_source)  # results
db fayen
  • 1
  • 1
0

Install Library

!pip install selenium
!apt-get update
!apt install chromium-chromedriver

And set up a chrome driver

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

# Set the path to the chromedriver executable
chromedriver_path = '/usr/bin/chromedriver'

# Set the Chrome driver options
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

# Start the Chrome driver
driver = webdriver.Chrome(service=Service(executable_path=chromedriver_path), options=options)

# Navigate to a website
driver.get('https://www.example.com')

# Quit the driver
driver.quit()
0

if you are taking any error like "WebDriverException: Message: Service chromedriver unexpectedly exited. Status code was: 1"

in notebook page Ctrl + Shift + P , choose "Use fallback runtime version" after try again.

Berhan Ay
  • 21
  • 3
-6

You can can rid of using .exe file by using WebDriverManager so instead of this

System.setProperty("webdriver.gecko.driver", "driverpath/.exe");
WebDriver driver = new FirefoxDriver();

you will be writing this

WebDriverManager.firefoxdriver().setup();
WebDriver driver = new FirefoxDriver();

All you need is add the dependecy to the POM file(Im assuming you using maven or some build tool) Please see my full answer about how to use this in this link Using WebdriverManager

mbn217
  • 884
  • 7
  • 14