216

I want to use PhantomJS in Python. I googled this problem but couldn't find proper solutions.

I find os.popen() may be a good choice. But I couldn't pass some arguments to it.

Using subprocess.Popen() may be a proper solution for now. I want to know whether there's a better solution or not.

Is there a way to use PhantomJS in Python?

Remi Guan
  • 21,506
  • 17
  • 64
  • 87
flyer
  • 9,280
  • 11
  • 46
  • 62
  • My answer below tells you how to do it. Just looking at your question and actually thats exactly what Selenium does, a `subprocess.popen` but with some extended features to make the api seamless. – Pykler Mar 20 '15 at 17:29
  • @flyer: You should probably consider changing the accepted answer, see below. Thank you. – dotancohen Dec 24 '15 at 09:27

8 Answers8

388

The easiest way to use PhantomJS in python is via Selenium. The simplest installation method is

  1. Install NodeJS
  2. Using Node's package manager install phantomjs: npm -g install phantomjs-prebuilt
  3. install selenium (in your virtualenv, if you are using that)

After installation, you may use phantom as simple as:

from selenium import webdriver

driver = webdriver.PhantomJS() # or add to your PATH
driver.set_window_size(1024, 768) # optional
driver.get('https://google.com/')
driver.save_screenshot('screen.png') # save a screenshot to disk
sbtn = driver.find_element_by_css_selector('button.gbqfba')
sbtn.click()

If your system path environment variable isn't set correctly, you'll need to specify the exact path as an argument to webdriver.PhantomJS(). Replace this:

driver = webdriver.PhantomJS() # or add to your PATH

... with the following:

driver = webdriver.PhantomJS(executable_path='/usr/local/lib/node_modules/phantomjs/lib/phantom/bin/phantomjs')

References:

davidjb
  • 8,247
  • 3
  • 32
  • 42
Pykler
  • 14,565
  • 9
  • 41
  • 50
  • 45
    This worked beautifully, and probably saved me days. Thank you. If one wants the whole rendered page back as source, it's `driver.page_source`. – scharfmn Apr 12 '13 at 15:12
  • Also using [error_handler](http://selenium.googlecode.com/svn/trunk/docs/api/py/webdriver_remote/selenium.webdriver.remote.webdriver.html) parameter when initializing PhantomJS WebDriver one can verify the status code is 200 or otherwise – Pykler Apr 17 '13 at 03:56
  • 4
    This does work beautifully, and I'm pleasantly surprised because http://phantomjs.org/faq.html says "not a Node.js module" --yet the npm wrapper at https://npmjs.org/package/phantomjs makes it behave for this purpose. In my case I wanted to do this: `bodyStr= driver.find_element_by_tag_name("body").get_attribute("innerHTML")` and ...it worked! – MarkHu Apr 17 '13 at 23:02
  • Just to offer my experience after following the advice of avoiding Ghost's dependency complexities. After fully going down the road of PhantomJS, I found myself forking and then recompiling Phantom because their interface to the underlying QT libraries expected too much configuration through command line arguments. So in avoiding complexity, I found myself writing C++ to modify an interface that is too simplified. Not to say that Phantomjs is bad advice, I would just advise to look into its limitations first as it's compilation time is about 20 minutes minimum if you need to modify its source. – brandon May 11 '13 at 18:29
  • 8
    I agree that ghost has crazy dependencies, and I actually failed to get it up and running even after installing millions of X11 related libraries. Ghost is a horror story. – Pykler May 13 '13 at 21:52
  • 1
    @brandon I am surprised you needed to recompile phantomjs ... I am curious about the usecase/brickwall you faced with vanilla phantomjs – Pykler May 13 '13 at 21:54
  • 1
    Thank you for this answer. This will probably save me a lot of time :-) – raben May 25 '13 at 12:22
  • 1
    I get "WebDriverException :Unable to start phantomjs with ghostdriver" I couldn't find what might cause this error. Can any one help? I'm using python 2.7 with windows. – phabtar Jul 19 '13 at 08:19
  • 5
    @phabtar You need to pass the path to phantomjs as the first argument to PhantomJS ... or fix your windows syspath to be able to see phantomjs. – Pykler Jul 23 '13 at 15:54
  • 1
    I was using sub process, but this is truly better. – Saintt Sheldon Patnett Aug 03 '13 at 06:28
  • Before this would run successfully, I had to create and give permission to the log file at /var/log/phantomjs/ghostdriver.log – andyzinsser Oct 10 '13 at 22:29
  • After hitting my head against trying to copy paste the examples from phantomjs and casperjs I gave up and giving this a try. Not to say they are horrible, I used webdriver before, and the ability to switch browser (phantomjs doesn't work with some sites) is a huge win. – KJW Oct 16 '13 at 04:18
  • I had some issues getting this to work I had both execvp(): Permission denied errors when running phantomjs from console, or Can not connect to GhostDriver errors. The solution was to run sudo phantomjs once and from then on it works fine. https://github.com/ariya/phantomjs/issues/11614 – MrBrightside Oct 29 '13 at 21:18
  • 1
    ABORT. EJECT. Avoid phantomjs for python. Waste of time. "Unable to start phantomjs with ghostdriver." Into eternity. Dev admitted to not updating something or other. Wish I knew this before spending hours trying to breathe life into phantomjs. – Alkanshel Nov 16 '13 at 05:12
  • @Amalgovinus I did notice this occasionally, it is something that happens if you run out of memory IIRC. Do you know where the link is to that dev material you mention. I would like to read-up on it. – Pykler Nov 17 '13 at 13:19
  • 1
    PhantomJS was installed in AppData. Is there a permanent fix besides specifying it as an argumentÉ – User Feb 13 '14 at 20:19
  • @macdonjo yes, make it visible on your system $PATH variable – Pykler Feb 13 '14 at 20:33
  • Thanks, I did that earlier as a guess, but still can't get the error to go away. So strange. I think I'll make an independent thread. – User Feb 14 '14 at 00:19
  • 2
    Dumb question : why do I have to install node-js? is there no other way to get pahantomJs? – Heetola Mar 29 '15 at 13:16
  • 1
    @Elidosa not a dumb question, its a sys admin style question ... not wanting to install what you don't need. Phantomjs is written in nodejs, thats why you need it in this case. Some distros have packages for phantomjs that will install nodejs behind the scenes. – Pykler Apr 01 '15 at 19:05
  • Selenium does not allow full control over PhantomJS such as handling callbacks. – Charlesthk May 16 '15 at 18:04
  • 2
    @Pykler PhantomJS is **not** written in node. It's written in C++ and is a headless webkit browser. The node package just installs the appropriate binary. – Vivin Paliath May 31 '15 at 21:55
  • 3
    Under Windows, I did not have to install `phantomJS` via `node` and `npm`. Downloading the binary from http://phantomjs.org/download.html and putting the `phantomjs.exe` into a location in my PATH (e.g. `c:\Windows\System32`) or vice versa (putting it anywhere and adding the folder to PATH) was enough to make it work in Python. – Dirk Sep 01 '15 at 17:26
  • Great answer. `sudo apt-get install phantomjs` worked for me in ubuntu tho. I had a failed install with npm previously. – gabbar0x Mar 29 '16 at 15:59
  • To make node/npm working correct with Ubuntu add this package: apt-get install nodejs-legacy (https://github.com/Medium/phantomjs#im-on-debian-or-ubuntu-and-the-installer-failed-because-it-couldnt-find-node) – Martin Krung Mar 08 '17 at 15:30
  • I managed to install PhantomJS on Ubuntu 16 using this command: `npm -g install phantomjs` – Markus Jan 14 '18 at 12:14
85

PhantomJS recently dropped Python support altogether. However, PhantomJS now embeds Ghost Driver.

A new project has since stepped up to fill the void: ghost.py. You probably want to use that instead:

from ghost import Ghost
ghost = Ghost()

with ghost.start() as session:
    page, extra_resources = ghost.open("http://jeanphi.me")
    assert page.http_status==200 and 'jeanphix' in ghost.content
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 22
    Even though support is dropped, I found that installing npm (node package manager) and using it to install the latest phantomjs (with webdriver support) and installing selenium in python ... way easier than trying to get PyQT or PySide to work properly. What's nice about phantom it is truly headless and requires no UI/X11 related libs to work. – Pykler Mar 29 '13 at 08:09
  • 13
    I added an answer below explaining my preferred solution after trying to use ghost.py and hating my life – Pykler Mar 29 '13 at 08:25
  • 8
    Pykler's "hating my life" isn't an understatement. If someone would change the "correct answer" for this question to Pykler's I would have saved a day's effort. – YPCrumble Sep 29 '13 at 17:33
  • 2
    @YPCrumble: unfortunately, only the OP can do that; change the accepted answer. – Martijn Pieters Sep 29 '13 at 18:10
  • 3
    After trying a bunch of different approaches this morning, @Pykler solution ended up working the smoothest. – andyzinsser Oct 10 '13 at 22:27
  • 1
    Though I don't like its syntax compared to PhantomJS, getting Ghost.py working with PyQT was bearable. I just had to change the code a little as mentioned here-- http://stackoverflow.com/questions/14575181/screen-scraping-using-ghost-py All in all, etting up Ghost.py was way easier than trying to get phantomjs working in selenium in python, which has proven impossible on a windows machine after hours of trying. – Alkanshel Nov 17 '13 at 03:44
  • @Amalgovinus are you still using ghost.py or were you able to get phantomjs going after that [github issue you mentioned](https://github.com/detro/ghostdriver/issues/236)? – Pykler Dec 12 '13 at 23:59
  • @Pykler I gave up on ghost.py because it lacks cookies.. wound up using lorien's Grab library instead, even though it lacks js support. I did manage to get phantomjs dialing out after the fact (by tweaking constructor params) since I asked a question about it-- http://superuser.com/questions/674322/python-selenium-phantomjs-unable-to-start-phantomjs-with-ghostdriver -- but there were problems after that, so I stuck with Grab. – Alkanshel Dec 13 '13 at 19:29
  • will ghost.py allow me to force the javascript on the page to load, so I can grab all the stuff (div, img, href, ) that the javascript loads on my page? i'm looking for a solution which I can then parse with Beautifulsoup (BS) – yoshiserry May 17 '14 at 11:50
  • Ghost is headless? I don't see that anywhere in the project and given the pyside/qt dependency I'm doubtful. – Ryne Everett Sep 03 '15 at 20:28
  • @RyneEverett: Ghost is headless. – Martijn Pieters Sep 03 '15 at 20:30
40

Now since the GhostDriver comes bundled with the PhantomJS, it has become even more convenient to use it through Selenium.

I tried the Node installation of PhantomJS, as suggested by Pykler, but in practice I found it to be slower than the standalone installation of PhantomJS. I guess standalone installation didn't provided these features earlier, but as of v1.9, it very much does so.

  1. Install PhantomJS (http://phantomjs.org/download.html) (If you are on Linux, following instructions will help https://stackoverflow.com/a/14267295/382630)
  2. Install Selenium using pip.

Now you can use like this

import selenium.webdriver
driver = selenium.webdriver.PhantomJS()
driver.get('http://google.com')
# do some processing

driver.quit()
Community
  • 1
  • 1
Pankaj
  • 3,592
  • 2
  • 26
  • 22
  • 3
    special thanks for pointing to SO answer concerning PhantomJS installation on Ubuntu, it helped me. – Dennis Golomazov Jul 21 '13 at 13:35
  • a quick way to install Selenium I just learned is, on Windows, type: C:\Python34\Scripts\pip.exe install Selenium. – ntk4 Sep 21 '16 at 05:06
8

Here's how I test javascript using PhantomJS and Django:

mobile/test_no_js_errors.js:

var page = require('webpage').create(),
    system = require('system'),
    url = system.args[1],
    status_code;

page.onError = function (msg, trace) {
    console.log(msg);
    trace.forEach(function(item) {
        console.log('  ', item.file, ':', item.line);
    });
};

page.onResourceReceived = function(resource) {
    if (resource.url == url) {
        status_code = resource.status;
    }
};

page.open(url, function (status) {
    if (status == "fail" || status_code != 200) {
        console.log("Error: " + status_code + " for url: " + url);
        phantom.exit(1);
    }
    phantom.exit(0);
});

mobile/tests.py:

import subprocess
from django.test import LiveServerTestCase

class MobileTest(LiveServerTestCase):
    def test_mobile_js(self):
        args = ["phantomjs", "mobile/test_no_js_errors.js", self.live_server_url]
        result = subprocess.check_output(args)
        self.assertEqual(result, "")  # No result means no error

Run tests:

manage.py test mobile

Emil Stenström
  • 13,329
  • 8
  • 53
  • 75
  • Thanks. I used **subprocess.Popen** to call the phantomjs script and it worked :) – flyer Dec 19 '12 at 11:19
  • You do see how this is limited right? All you are doing is making a shell call to execute phantomjs - you are not actually using a "proper" interface through which you may properly handle exceptions, blocking, etc. – kamelkev May 05 '13 at 01:58
  • @kamelkev: I see how this is limited. The upside is that this method allows me to use Django's bootstraping features to set up a test database with the correct content for each test. And yes, it could be combined with the other answers to get the best of both worlds. – Emil Stenström May 06 '13 at 10:30
6

The answer by @Pykler is great but the Node requirement is outdated. The comments in that answer suggest the simpler answer, which I've put here to save others time:

  1. Install PhantomJS

    As @Vivin-Paliath points out, it's a standalone project, not part of Node.

    Mac:

    brew install phantomjs
    

    Ubuntu:

    sudo apt-get install phantomjs
    

    etc

  2. Set up a virtualenv (if you haven't already):

    virtualenv mypy  # doesn't have to be "mypy". Can be anything.
    . mypy/bin/activate
    

    If your machine has both Python 2 and 3 you may need run virtualenv-3.6 mypy or similar.

  3. Install selenium:

    pip install selenium
    
  4. Try a simple test, like this borrowed from the docs:

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    
    driver = webdriver.PhantomJS()
    driver.get("http://www.python.org")
    assert "Python" in driver.title
    elem = driver.find_element_by_name("q")
    elem.clear()
    elem.send_keys("pycon")
    elem.send_keys(Keys.RETURN)
    assert "No results found." not in driver.page_source
    driver.close()
    
Community
  • 1
  • 1
Andrew E
  • 7,697
  • 3
  • 42
  • 38
  • How to install `PhantomJS` on windows ? It doesn't seem to work using `pip` command. – MD. Khairul Basar Mar 19 '17 at 14:18
  • 1
    Pip is a python package installer, so it works with selenium, which is available as a python package. PhantomJS is not a python package so won't work with pip. I did a quick google for "PhantomJS install windows" and there are good hits. – Andrew E Mar 19 '17 at 15:11
5

this is what I do, python3.3. I was processing huge lists of sites, so failing on the timeout was vital for the job to run through the entire list.

command = "phantomjs --ignore-ssl-errors=true "+<your js file for phantom>
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)

# make sure phantomjs has time to download/process the page
# but if we get nothing after 30 sec, just move on
try:
    output, errors = process.communicate(timeout=30)
except Exception as e:
    print("\t\tException: %s" % e)
    process.kill()

# output will be weird, decode to utf-8 to save heartache
phantom_output = ''
for out_line in output.splitlines():
    phantom_output += out_line.decode('utf-8')
tlib
  • 51
  • 1
  • 1
5

If using Anaconda, install with:

conda install PhantomJS

in your script:

from selenium import webdriver
driver=webdriver.PhantomJS()

works perfectly.

clg4
  • 2,863
  • 6
  • 27
  • 32
2

In case you are using Buildout, you can easily automate the installation processes that Pykler describes using the gp.recipe.node recipe.

[nodejs]
recipe = gp.recipe.node
version = 0.10.32
npms = phantomjs
scripts = phantomjs

That part installs node.js as binary (at least on my system) and then uses npm to install PhantomJS. Finally it creates an entry point bin/phantomjs, which you can call the PhantomJS webdriver with. (To install Selenium, you need to specify it in your egg requirements or in the Buildout configuration.)

driver = webdriver.PhantomJS('bin/phantomjs')
Dawn Drescher
  • 901
  • 11
  • 17
  • 1
    another way to automate installation process with buildout it's just use `gp.recipe.phantomjs`, that configures `phantomjs` and `casperjs` – gakhov Oct 30 '14 at 15:01