33

I am trying to use selenium from python to scrape some dynamics pages with javascript. However, I cannot call firefox after I followed the instruction of selenium on the pypi page(http://pypi.python.org/pypi/selenium). I installed firefox on AWS ubuntu 12.04. The error message I got is:

In [1]: from selenium import webdriver

In [2]: br = webdriver.Firefox()
---------------------------------------------------------------------------
WebDriverException                        Traceback (most recent call last)
/home/ubuntu/<ipython-input-2-d6a5d754ea44> in <module>()
----> 1 br = webdriver.Firefox()

/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.pyc in __init__(self, firefox_profile, firefox_binary, timeout)
     49         RemoteWebDriver.__init__(self,
     50             command_executor=ExtensionConnection("127.0.0.1", self.profile,
---> 51             self.binary, timeout),
     52             desired_capabilities=DesiredCapabilities.FIREFOX)
     53

/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/extension_connection.pyc in __init__(self, host, firefox_profile, firefox_binary, timeout)
     45         self.profile.add_extension()
     46
---> 47         self.binary.launch_browser(self.profile)
     48         _URL = "http://%s:%d/hub" % (HOST, PORT)
     49         RemoteConnection.__init__(

/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/firefox_binary.pyc in launch_browser(self, profile)
     42
     43         self._start_from_profile_path(self.profile.path)
---> 44         self._wait_until_connectable()
     45
     46     def kill(self):

/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/firefox_binary.pyc in _wait_until_connectable(self)
     79                 raise WebDriverException("The browser appears to have exited "
     80                       "before we could connect. The output was: %s" %
---> 81                       self._get_firefox_output())
     82             if count == 30:
     83                 self.kill()

WebDriverException: Message: 'The browser appears to have exited before we could connect. The output was: Error: no display specified\n'

I did search on the web and found that this problem happened with other people (https://groups.google.com/forum/?fromgroups=#!topic/selenium-users/21sJrOJULZY). But I don't understand the solution, if it is.

Can anyone help me please? Thanks!

David
  • 449
  • 2
  • 5
  • 7
  • 4
    `Error: no display specified` means that the browser doesn't have a screen to display its main window on. You'll need to find a way to run Firefox headless: http://stackoverflow.com/questions/10060417/python-firefox-headless. This answer in particular looks useful: http://stackoverflow.com/a/6300672/464744 – Blender Oct 23 '12 at 21:31
  • @Blender Thank you so much. The second link solved my problem. Sometimes I just cannot find the solution from google if I don't have the right keyword in my mind. – David Oct 23 '12 at 21:52
  • @Blender : how did you get an anchor in your URL to a response in a page ? I see no links like this in the pages. – Gilles Quénot Oct 23 '12 at 22:35
  • @sputnick Look at the 'share' href. – John Keyes Oct 24 '12 at 00:04

4 Answers4

58

The problem is Firefox requires a display. I've used pyvirtualdisplay in my example to simulate a display. The solution is:

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=False, size=(1024, 768))
display.start()

driver= webdriver.Firefox()
driver.get("http://www.somewebsite.com/")

<---some code--->

#driver.close() # Close the current window.
driver.quit() # Quit the driver and close every associated window.
display.stop()

Please note that pyvirtualdisplay requires one of the following back-ends: Xvfb, Xephyr, Xvnc.

This should resolve your issue.

Abhishek Kumar
  • 383
  • 4
  • 18
That1Guy
  • 7,075
  • 4
  • 47
  • 59
  • Thanks. One quick question: what's difference between driver.close() and driver.quite()? – David Oct 24 '12 at 19:19
  • 2
    to help people in the future who stumble across this.. please note that the comment differentiating close/quit from @That1Guy is just plain wrong. Also note the code in his answer never properly shuts down the underlying driver and may leak processes or file descriptors. `driver.close()` simply closes the current window. It will leave other windows open and the driver active. `driver.quit()` actually quits the driver and closes every associated window. If you want more details about the difference, please read the selenium webdriver source code. – Corey Goldberg Jan 06 '16 at 23:18
  • 1
    A note to future readers: The comment @CoreyGoldberg is referring to was in fact incorrect. I had confused not only the methods I mentioned referring to Selenium, but also with another project I was working on at the time. Please refer to the documentation for [`driver.quit()`](http://selenium-python.readthedocs.org/api.html?highlight=driver.quit#selenium.webdriver.firefox.webdriver.WebDriver.quit) and [`drver.close()`](http://selenium-python.readthedocs.org/api.html?highlight=driver.close#selenium.webdriver.remote.webdriver.WebDriver.close). – That1Guy Jan 07 '16 at 21:11
  • @gvrocha I haven't tested, but I don't see why `pyvirtualdisplay` wouldn't work on a mac. – That1Guy Mar 04 '16 at 15:43
  • I tried the above steps. But, i still get the same error. Can anyone help? I installed xvfb by running sudo apt-get install xvfb. And then tried to run the script mentioned in this answer. But, i still get the same error "selenium.common.exceptions.WebDriverException: Message: The browser appears to have exited before we could connect..." – sridhar249 Jun 14 '16 at 20:12
  • @sridhar249 Are you sure you remembered the `display.start()` step? – That1Guy Jun 14 '16 at 20:41
  • @That1Guy yes, i did not miss that. Here is my code: `from pyvirtualdisplay import Display from selenium import webdriver display = Display(visible=0, size=(800, 600)) display.start() browser = webdriver.Firefox() browser.get('http://www.google.com') browser.quit() display.stop() ` – sridhar249 Jun 14 '16 at 21:39
  • @sridhar249 Interesting. I'll have to try a few things and get back to you. – That1Guy Jun 14 '16 at 22:03
  • Thanks @That1Guy. Will look forward to your response. – sridhar249 Jun 14 '16 at 22:12
  • I would like to mention that it worked perfectly for me on google chrome. I noticed this issue only with Firefox driver. – sridhar249 Jun 15 '16 at 04:58
  • @sridhar249 Perhaps this is a bug with the Firefox driver. I was unable to reproduce this. What version of Selenium are you using? – That1Guy Jun 15 '16 at 13:35
  • @That1Guy, i am using selenium 2.53.0 which is the latest version. My firefox version 47.0, which is also the latest version. My OS is: 14.04LTS. Since i am using the latest builds, if this is an existing issue, i thought other people should also be facing this. But, i don't see anyone complaining. Is your selenium, firefox or OS different from mine? – sridhar249 Jun 15 '16 at 14:55
4

I too had faced same problem.I was on Firefox 47 and Selenium 2.53. So what I did was downgraded Firefox to 45. This worked.

1) Remove Firefox 47 first :

sudo apt-get purge firefox

2) Check for available versions:

apt-cache show firefox | grep Version

It will show available firefox versions like:

Version: 47.0+build3-0ubuntu0.16.04.1

Version: 45.0.2+build1-0ubuntu1

3) Tell which build to download

sudo apt-get install firefox=45.0.2+build1-0ubuntu1

4) Next you have to not upgrade to the newer version again.

sudo apt-mark hold firefox

5) If you want to upgrade later

sudo apt-mark unhold firefox sudo apt-get upgrade

Hope this helps.

Amogh Joshi
  • 449
  • 5
  • 13
0

This is already in the comment of OP's question, but to lay it out as an answer. You can have Selenium run in the background without opening an actual browser window.

For example, if you use Chrome, set these options:

from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.set_headless(headless=True)

Then when you call your web driver, your settings become a parameter:

browser = webdriver.Chrome(chrome_options=chrome_options)
David Skarbrevik
  • 675
  • 3
  • 9
  • 19
0

For Debian 10 and Ubuntu 18.04 this is a complete running example:

  1. Download the Chrome driver in ~/Downloads:
    $ wget https://chromedriver.storage.googleapis.com/80.0.3987.16/chromedriver_linux64.zip

  2. Unpack it with unzip chromedriver_linux64.zip

  3. Move the file to an executable folder (already with a path):
    $ sudo mv chromedriver /usr/local/bin

Then run this code in a notebook with Jupyter or within a a script:

from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.set_headless(headless=True)


browser = Chrome(chrome_options=chrome_options)
browser.get('http://www.linkedin.com/')
print(browser.page_source)

This will print the whole source HTML in the page.

f0nzie
  • 1,086
  • 14
  • 17