Python: the right URL to download pictures from Google Image Search

Question

I'm trying do obtain images from Google Image search for a specific query. But the page I download is without pictures and it redirects me to Google's original one. Here's my code:

AGENT_ID   = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"

GOOGLE_URL = "https://www.google.com/images?source=hp&q={0}"

_myGooglePage = ""

def scrape(self, theQuery) :
    self._myGooglePage = subprocess.check_output(["curl", "-L", "-A", self.AGENT_ID, self.GOOGLE_URL.format(urllib.quote(theQuery))], stderr=subprocess.STDOUT)
    print self.GOOGLE_URL.format(urllib.quote(theQuery))
    print self._myGooglePage
    f = open('./../../googleimages.html', 'w')
    f.write(self._myGooglePage)

What am I doing wrong?

Thanks

@silviolor: I know it doesn't help your problem but why not use python's inbuilt `urllib2` module instead of `curl`. — RanRag, Feb 16 '12 at 21:14

score 6 · Answer 1 · answered Nov 24 '12 at 07:33

This is the code in Python that I use to search and download images from Google, hope it helps:

import os
import sys
import time
from urllib import FancyURLopener
import urllib2
import simplejson

# Define search term
searchTerm = "hello world"

# Replace spaces ' ' in search term for '%20' in order to comply with request
searchTerm = searchTerm.replace(' ','%20')


# Start FancyURLopener with defined version 
class MyOpener(FancyURLopener): 
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
myopener = MyOpener()

# Set count to 0
count= 0

for i in range(0,10):
    # Notice that the start changes for each iteration in order to request a new set of images for each loop
    url = ('https://ajax.googleapis.com/ajax/services/search/images?' + 'v=1.0&q='+searchTerm+'&start='+str(i*4)+'&userip=MyIP')
    print url
    request = urllib2.Request(url, None, {'Referer': 'testing'})
    response = urllib2.urlopen(request)

    # Get results using JSON
    results = simplejson.load(response)
    data = results['responseData']
    dataInfo = data['results']

    # Iterate for each result and get unescaped url
    for myUrl in dataInfo:
        count = count + 1
        print myUrl['unescapedUrl']

        myopener.retrieve(myUrl['unescapedUrl'],str(count)+'.jpg')

    # Sleep for one second to prevent IP blocking from Google
    time.sleep(1)

You can also find very useful information here.

Is it possible define the image type at the given url to Google — erogol, Aug 09 '14 at 09:11
I have not look at this for a while but check the latest Google API. I think the answer is yes, you can refine your search to ".png", ".jpg", and even to the vector based format ".svg". — Jaime Ivan Cervantes, Aug 09 '14 at 17:41

crizCraig · Answer 2 · 2018-10-13T20:33:30.890

3

Here's a short script I wrote that does the whole deed.

edited Oct 13 '18 at 20:33

answered May 27 '12 at 23:29

crizCraig

8,487
6
54
53

Hello, your script seem to be using PIL. Unfortunately I seem to have HUGE problems in installing PIL on this machine. Since I just need the images, without transforming them in any way, is there a way to get get away without it? – Pietro Speroni Jul 08 '12 at 10:18
I'm not sure how to avoid PIL, but I highly recommend MacPorts if you're using a Mac to simplify package installation and install PIL for you. – crizCraig Jul 09 '12 at 20:07

score 3 · Accepted Answer · answered Feb 17 '12 at 00:06

I'll give you a hint ... start here:

https://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=JULIE%20NEWMAR

Where JULIE and NEWMAR are your search terms.

That will return the json data you need ... you'll need to parse that using json.load or simplejson.load to get back a dict ... followed by diving into it to find first the responseData, then the results list which contains the individual items whose url you will then want to download.

Though I don't suggest in any way doing automated scraping of Google, since their (deprecated) API for this specifically says not to.

Please note, this API is no longer available. – prooffreader Feb 29 '16 at 17:45 — prooffreader, Feb 29 '16 at 17:45

score 0 · Answer 4 · answered Sep 11 '13 at 19:26

i am just joing to answer this, even though it is old. there is a much simpler way to go about doing this.

def google_image(x):
        search = x.split()
        search = '%20'.join(map(str, search))
        url = 'http://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=%s&safe=off' % search
        search_results = urllib.request.urlopen(url)
        js = json.loads(search_results.read().decode())
        results = js['responseData']['results']
        for i in results: rest = i['unescapedUrl']
        return rest

that is it.

this is in 3.x, so replace urllib.request with urllib2 in 2.x obviously. — riyoken, Sep 11 '13 at 19:28

Ravi Hirani · Answer 5 · 2018-07-06T07:17:04.293

0

One of the best ways is to use icrawler. Check below answer. It is working for me.

https://stackoverflow.com/a/51204611/4198099

edited Jul 06 '18 at 07:17

answered Jul 06 '18 at 07:10

Ravi Hirani

6,511
1
27
42

Python: the right URL to download pictures from Google Image Search

5 Answers5