10

i will take a screenshot from this page: http://books.google.de/books?id=gikDAAAAMBAJ&pg=PA1&img=1&w=2500 or save the image that it outputs.

But i can't find a way. With wget/curl i get an "unavailable error" and also with others tools like webkit2png/wkhtmltoimage/wkhtmltopng.

Is there a clean way to do it with python or from commandline?

Best regards!

danbruegge
  • 2,104
  • 3
  • 20
  • 27
  • [I believe this was answered in a different thread?](http://stackoverflow.com/questions/69645/take-a-screenshot-via-a-python-script-linux) – user856358 May 02 '13 at 18:03
  • As far as i understand this, they take no screenshot from a webpage, only from an opened window. But my plan is to do it without open the url by my self. There ~1000 images to save. Only covers from some books. – danbruegge May 02 '13 at 18:10

3 Answers3

15

You can use ghost.py if you like. https://github.com/jeanphix/Ghost.py

Here is an example of how to use it.

from ghost import Ghost
ghost = Ghost(wait_timeout=4)
ghost.open('http://www.google.com')
ghost.capture_to('screen_shot.png')

The last line saves the image in your current directory.

Hope this helps

mbomb007
  • 3,788
  • 3
  • 39
  • 68
Sason Torosean
  • 562
  • 3
  • 18
8

I had difficulty getting Ghost to take a screenshot consistently on a headless Centos VM. Selenium and PhantomJS worked for me:

from selenium import webdriver
br = webdriver.PhantomJS()
br.get('http://www.stackoverflow.com')
br.save_screenshot('screenshot.png')
br.quit
Rahul K P
  • 15,740
  • 4
  • 35
  • 52
billrichards
  • 2,041
  • 4
  • 25
  • 35
  • I am getting this error when running this:`Traceback (most recent call last): File "C:\bunker\Lib\site-packages\custom_selenium.py", line 2, in br = webdriver.PhantomJS() File "C:\bunker\Lib\site-packages\selenium\webdriver\phantomjs\webdriver.py", line 49, in __init__ service_args=service_args,log_path=service_log_path) TypeError: __init__() got an unexpected keyword argument 'log_path'` – Ashish Gupta Oct 01 '14 at 04:15
  • hmm, not sure but i wonder what happens if you edit webdriver.py __init__ and remove the log_path argument – billrichards Oct 08 '14 at 17:26
6

Sometimes you need extra http headers such User-Agent to get downloads to work. In python 2.7, you can:

import urllib2
request = urllib2.Request(
    r'http://books.google.de/books?id=gikDAAAAMBAJ&pg=PA1&img=1&w=2500',
    headers={'User-Agent':'Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 firefox/2.0.0.11'})
page = urllib2.urlopen(request)

with open('somefile.png','wb') as f:
    f.write(page.read())

Or you can look at the params for adding http headers in wget or curl.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • 1
    Yet it will not produce an image of the captured website. The image will be broken. – Mahadeva Jan 18 '16 at 09:17
  • @SarvagyaPant I ran this script and verified that a non-broken image is downloaded. This took me less than a minute. Can you please put a minimum of work in before making unsubstantiated claims. – tdelaney Jan 18 '16 at 17:31
  • It will make correct image only when the `url` is `direct link to image`. For other html based web-page, this won't work. Moreover, one can directly use `urllib.urlretrieve` if the url is guaranteed to be an image. – Mahadeva Jan 19 '16 at 08:19
  • It works for any single resource such as an image, a web page, an mp3, pdf and etc... It doesn't follow links or build a composite web page, but that's not what the user was after. He showed us a url to an image and said he wanted a "screenshot" of the image. But the "screenshot" is just the image file itself. There are multiple ways to download web content - my example is a perfectly normal accepted way. – tdelaney Jan 19 '16 at 17:36