6

I was trying to run headless Chrome browser using Selenium to scrape contents from the web. I installed headless Chrome using wget and then unzipped in my current folder.

!wget "http://chromedriver.storage.googleapis.com/2.25/chromedriver_linux64.zip"
!unzip chromedriver_linux64.zip

Now when I am loading the driver

from selenium.webdriver.chrome.options import Options
import os
# instantiate a chrome options object so you can set the size and headless preference
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--window-size=1920x1080")

chrome_driver = os.getcwd() +"/chromedriver"
driver = webdriver.Chrome(chrome_options=chrome_options,executable_path=chrome_driver)

I am getting an error

WebDriverException                        Traceback (most recent call last)
<ipython-input-67-0aeae0cfd891> in <module>()
----> 1 driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=chrome_driver)
  2 driver.get("https://www.google.com")
  3 lucky_button = driver.find_element_by_css_selector("[name=btnI]")
  4 lucky_button.click()
  5 /usr/local/lib/python3.6/dist-packages/selenium/webdriver/chrome/webdriver.py in __init__(self, executable_path, port, chrome_options, service_args, desired_capabilities, service_log_path)
 60             service_args=service_args,
 61             log_path=service_log_path)
---> 62         self.service.start()
 63 
 64         try:

 /usr/local/lib/python3.6/dist-packages/selenium/webdriver/common/service.py in start(self)
 84         count = 0
 85         while True:
 ---> 86             self.assert_process_still_running()
 87             if self.is_connectable():
 88                 break

 /usr/local/lib/python3.6/dist-packages/selenium/webdriver/common/service.py in assert_process_still_running(self)
 97             raise WebDriverException(
 98                 'Service %s unexpectedly exited. Status code was: %s'
 ---> 99                 % (self.path, return_code)
100             )
101 

WebDriverException: Message: Service /content/chromedriver unexpectedly exited. Status code was: -6

Update

So after some research I tried the other way

!apt install chromium-chromedriver
import selenium as se

options = se.webdriver.ChromeOptions()
options.add_argument('headless')

driver = se.webdriver.Chrome(chrome_options=options)

On Google Colab which again gives me the same error

WebDriverException: Message: Service chromedriver unexpectedly exited. Status code was: -6
halfer
  • 19,824
  • 17
  • 99
  • 186
Himanshu Poddar
  • 7,112
  • 10
  • 47
  • 93
  • @DebanjanB The 3 questions don't have a relevant answer to this question. The OS is different (Win vs. Linux) and no accepted answers. – korakot Nov 29 '18 at 09:59
  • 3
    None of the answer is related to error code -6. I tried all the methods. Please do not mark duplicate before reading the entire question. – Himanshu Poddar Nov 29 '18 at 10:04
  • 1
    @KorakotChaovavanich The main error is **chromedriver unexpectedly exited**. Different _OS_ and different _Selenium Language Binding Arts_ will show different `Status code` for the same error. I have pointed OP to the most relevant discussions. Let me know if you have further concerns. – undetected Selenium Nov 29 '18 at 10:06
  • Yes I do none of the solution you guyz pointed resolve my issue – Himanshu Poddar Nov 29 '18 at 10:07
  • i think the intent here is to scrape web why are you not using beautifulsoup instead – MD5 Nov 29 '18 at 10:20
  • 1
    Beautiful soup cannot scrape javascript generated content – Himanshu Poddar Nov 29 '18 at 12:57
  • 3
    Please share a notebook that reproduces the problem you observe. – Bob Smith Dec 03 '18 at 04:50

3 Answers3

16

I have found the answer to the question about why I was getting an error. Please install the chromium-chromedriver and add it to your path variable as well as the bin directory.

This is the fully-fledged solution to the problem of how to scrape data using Selenium on Colab. There is one more method by using PhantomJS but this API has been deprecated by Selenium and hopefully they will remove it in the next Selenium update.

# install chromium, its driver, and selenium
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install selenium
# set options to be headless, ..
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome('chromedriver',options=options)
wd.get("https://www.website.com")
print(wd.page_source)  # results

This would work for anyone who want to scrape their data on Google Colab and not on your local machine. Please execute the steps as shown sequentially in the same order.

You can find the notebook here https://colab.research.google.com/drive/1GFJKhpOju_WLAgiVPCzCGTBVGMkyAjtk .

halfer
  • 19,824
  • 17
  • 99
  • 186
Himanshu Poddar
  • 7,112
  • 10
  • 47
  • 93
  • 1
    From the warning "DeprecationWarning: use options instead of chrome_options". Is there a reason to use chrome_options instead of options? – korakot Dec 03 '18 at 13:05
  • 1
    Also, I don't think sys.path.insert() is necessary. sys.path is for searching python modules, not executables. – korakot Dec 03 '18 at 13:09
  • 1
    No sys.path.insert() is not necessary you can remove it. I have edited my answer. And for options I could not find its new release analogous function. – Himanshu Poddar Dec 03 '18 at 13:19
  • @KorakotChaovavanich I was think there is a problem in this statement webdriver.ChromeOptions() but the real cause was somewhere else. Thanks by the way. – Himanshu Poddar Dec 03 '18 at 13:49
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/184633/discussion-between-himanshu-poddar-and-korakot-chaovavanich). – Himanshu Poddar Dec 03 '18 at 13:51
  • Can you also explain about --disable-dev-shm-usage and --no-sandbox. Just curious. Is there information why it is needed somewhere? – korakot Dec 03 '18 at 15:20
  • this work me last time but this time i try again on my server it get following exceptions: File "/usr/local/lib/python3.8/dist-packages/selenium/webdriver/common/service.py", line 104, in start raise WebDriverException("Can not connect to the Service %s" % self.path) selenium.common.exceptions.WebDriverException: Message: Can not connect to the Service chromedriver Can you help please, im banned from questioning on statckoverflow – zaheer Apr 10 '21 at 17:17
  • Works fine for me still! – Himanshu Poddar Apr 10 '21 at 18:06
2

This error message...

WebDriverException: Message: Service /content/chromedriver unexpectedly exited. Status code was: -6

...implies that the ChromeDriver exited unexpectedly.

Your main issue is the incompatibility between the version of the binaries you are using as follows:

  • As per the line of code:

    !wget "http://chromedriver.storage.googleapis.com/2.25/chromedriver_linux64.zip"
    
  • You are using chromedriver=2.25

  • Release Notes of chromedriver=2.25 clearly mentions the following :

Supports Chrome v53-55

  • Though you haven't mentioned the version of Chrome Browser it is expected you are using one of the latest Chrome Browser releases.

So there is a clear mismatch between ChromeDriver v2.33 and the recently released Chrome Browser versions.

Solution


Update

I am not sure about google-colaboratory. The bottomline is you have to use the matching version of ChromeDriver with respect to the prevailing version of Google Chrome version installed.

However, you need to find a way to install Chrome or Chromium on Colab first. Then, you can use !wget and !unzip to download, unzip and start using the matching ChromeDriver version.

You can find a discussion on the compatibility between ChromeDriver and Chrome Browser in this discussion

korakot
  • 37,818
  • 16
  • 123
  • 144
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • I am doing this on colab. Why do I need to install google chrome on cloud. – Himanshu Poddar Nov 29 '18 at 12:55
  • @DebanjanB Please provide a relevant answer to 'Google Colaboratory'. It's a free Jupyter Notebook service running ubuntu 17 at https://colab.research.google.com There is no 'Project Workspace', no IDE, no Rebuild, no Revo. You can't reboot it like your laptop. – korakot Nov 29 '18 at 13:23
  • @HimanshuPoddar I am not sure about `google-colaboratory`. The bottomline is you have to use the relevant _ChromeDriver_ as per the version of the prevailing _Google Chrome_ version. – undetected Selenium Nov 29 '18 at 13:25
  • In Colab, you can only install things through `!apt` or `!apt-get` or `!wget` or `!pip install`. You can't interact with the command line, so need to use -y options. – korakot Nov 29 '18 at 13:25
  • The hard things are: 1. How to install Chrome through command line? 2. How to check that Chrome is installed correctly? 3. How to check if ChromeDriver work correctly with Chrome? The rest is easy: !pip install selenium, then test it. – korakot Nov 29 '18 at 13:38
  • @KorakotChaovavanich You don't even need to install Chrome, just use the one whichever is installed. You can always use `!wget` and `!unzip` to download and start using the matching _ChromeDriver_. – undetected Selenium Nov 29 '18 at 13:40
  • @DebanjanB There is no browser installed already in Colab. You only have wget and curl. You need to install either Chrome or Chromium or even Firefox. But I am not sure how to install them. – korakot Nov 29 '18 at 13:42
  • @KorakotChaovavanich You will need a browser installed either among Chrome or Chromium or even Firefox to execute any test through Selenium. – undetected Selenium Nov 29 '18 at 13:48
  • @DebanjanB That's right. The key issue here is how to install a browser. I gave one example below installing phantomjs. Please make one for Chrome, if you can. – korakot Nov 29 '18 at 13:50
  • @KorakotChaovavanich but the problem with phantomjs is they are gonna remove the support for phantomjs in selenium's next version. – Himanshu Poddar Nov 29 '18 at 14:19
  • @DebanjanB Please edit your answer about Chrome installation. *Test Environment* is Colab. And it doesn't come with Chrome installed. – korakot Nov 29 '18 at 15:10
  • @KorakotChaovavanich **`WebDriverException`** is a _Selenium_ specific exception and is pretty generic irrespective of _Chrome_, _Chrome Canary_ and _google-colaboratory_ where the main reason is plain mismatch of the _binary versions_. Now, whether _google-colaboratory_ allows _Chrome_ browser upgrade or not will be a separate question/discussion all together and would be best answered by _google-colaboratory_ contributors. However as the end result is _WebDriverException_ for the benefit of the future readers IMO we need to keep this information. – undetected Selenium Nov 29 '18 at 18:47
  • 1
    @DebanjanB What you said here "just use the existing installation of Google Chrome whichever is installed in your Test Environment." is simply wrong. Please correct it. The rest is useful, thanks! – korakot Nov 30 '18 at 05:31
  • @KorakotChaovavanich As the question is tagged with _google-colaboratory_ and you being a valued contributor to _google-colaboratory_ tag, you are always welcome to make the minor adjustments to make this/any content useful to future readers. – undetected Selenium Nov 30 '18 at 06:09
  • @DebanjanB Ok, corrected them. I can't install Chrome/Chromium myself so a bit embarrassed at this, though. Hope someone who succeeds can make an example Colab notebook for us. – korakot Nov 30 '18 at 06:21
  • @KorakotChaovavanich Thanks for the useful edit. If _install Chrome or Chromium on Colab first_ is the only way out, I think I have a better solution to offer. However that would be _out of scope_ for this question and you have to raise a new question with your new requirement. – undetected Selenium Nov 30 '18 at 06:22
  • @KorakotChaovavanich While you raise the question please be precise that you want to execute _Selenium_ related tests on _google-colaboratory_ which doesn't have _Chrome_ pre-installed. Ask about the best ideas/options. I think community will accept the question happily. – undetected Selenium Nov 30 '18 at 06:30
  • @DebanjanB Please answer it here. So anyone searching can find the answer easily. I start a bounty as a thank you for that. – korakot Dec 03 '18 at 02:25
1

This may not directly help you. But if eventually, you can't install Chrome + selenium, you can still use phantomjs + selenium. Like this notebook:

https://colab.research.google.com/drive/1V62zhjw2V5buxdN1s9mqkLzh3FWqSq8S

But I would prefer Chrome, if possible.

korakot
  • 37,818
  • 16
  • 123
  • 144