8

i'm having a very tough time searching google image search with python. I need to do it using only standard python libraries (so urllib, urllib2, json, ..)

Can somebody please help? Assume the image is jpeg.jpg and is in same folder I'm running python from.

I've tried a hundred different code versions, using headers, user-agent, base64 encoding, different urls (images.google.com, http://images.google.com/searchbyimage?hl=en&biw=1060&bih=766&gbv=2&site=search&image_url={{URL To your image}}&sa=X&ei=H6RaTtb5JcTeiALlmPi2CQ&ved=0CDsQ9Q8, etc....)

Nothing works, it's always an error, 404, 401 or broken pipe :(

Please show me some python script that will actually seach google images with my own image as the search data ('jpeg.jpg' stored on my computer/device)

Thank you for whomever can solve this,

Dave:)

user1488252
  • 117
  • 1
  • 2
  • 3
  • It's probably not all that surprising that Google is better at stopping you from scraping their pages than you are at circumventing their protection. – Wooble Jun 28 '12 at 13:02
  • No, it's more that I just don't understand urllib2. Whether I search by browser, or by python through my android phone, I can post without errors sometimes, but getting the result I just dont' understand. I've been studying urllib2 for days now and it just seems all over the place, there's mimetypes, headers, several varieties of urllib.. then there's altered recipe's... and yet no manual on how to use urllib, or urllib2, properly. There's many posts online.. but each one is different. for example, here's one that posts to google translate: – user1488252 Jun 29 '12 at 08:07
  • https://bitbucket.org/vgavro/google_translate/src/19807740244a/google_translate.py – user1488252 Jun 29 '12 at 08:18
  • This python script might help: http://bit.ly/QjIy21 – EyalAr Oct 15 '12 at 23:34
  • http://stackoverflow.com/a/22871658/538284 – Omid Raha Apr 05 '14 at 22:37

2 Answers2

0

I use the following code in Python to search for Google images and download the images to my computer:

import os
import sys
import time
from urllib import FancyURLopener
import urllib2
import simplejson

# Define search term
searchTerm = "hello world"

# Replace spaces ' ' in search term for '%20' in order to comply with request
searchTerm = searchTerm.replace(' ','%20')


# Start FancyURLopener with defined version 
class MyOpener(FancyURLopener): 
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
myopener = MyOpener()

# Set count to 0
count= 0

for i in range(0,10):
    # Notice that the start changes for each iteration in order to request a new set of images for each loop
    url = ('https://ajax.googleapis.com/ajax/services/search/images?' + 'v=1.0&q='+searchTerm+'&start='+str(i*4)+'&userip=MyIP')
    print url
    request = urllib2.Request(url, None, {'Referer': 'testing'})
    response = urllib2.urlopen(request)

    # Get results using JSON
    results = simplejson.load(response)
    data = results['responseData']
    dataInfo = data['results']

    # Iterate for each result and get unescaped url
    for myUrl in dataInfo:
        count = count + 1
        print myUrl['unescapedUrl']

        myopener.retrieve(myUrl['unescapedUrl'],str(count)+'.jpg')

    # Sleep for one second to prevent IP blocking from Google
    time.sleep(1)

You can also find very useful information here.

Jaime Ivan Cervantes
  • 3,579
  • 1
  • 40
  • 38
  • 1
    data might be None sometimes. – itsuper7 May 30 '13 at 13:11
  • 10
    How did this get upvoted? It doesn't answer the OP's question at all. The question was "Please show me some python script that will actually seach google images **with my own image as the search data ('jpeg.jpg' stored on my computer/device)**". – Natsukane Apr 24 '14 at 11:29
  • 2
    Also just as a note, saving images from a google search using their API is a direct violation of their terms and services laid out [here](https://developers.google.com/image-search/terms) – Nick Jarvis Jun 14 '15 at 02:30
  • That API is now deprecated, unfortunately – Radu Mar 30 '17 at 18:12
0

The Google Image Search API is deprecated, we use google search to download the images using REgex and Beautiful soup

from bs4 import BeautifulSoup
import requests
import re
import urllib2
import os


def get_soup(url,header):
  return BeautifulSoup(urllib2.urlopen(urllib2.Request(url,headers=header)))

image_type = "Action"
# you can change the query for the image  here  
query = "Terminator 3 Movie"
query= query.split()
query='+'.join(query)
url="https://www.google.co.in/searches_sm=122&source=lnms&tbm=isch&sa=X&ei=4r_cVID3NYayoQTb4ICQBA&ved=0CAgQ_AUoAQ&biw=1242&bih=619&q="+query

print url
header = {'User-Agent': 'Mozilla/5.0'} 
soup = get_soup(url,header)

images = [a['src'] for a in soup.find_all("img", {"src": re.compile("gstatic.com")})]
#print images
for img in images:
  raw_img = urllib2.urlopen(img).read()
  #add the directory for your image here 
  DIR="C:\Users\hp\Pictures\\valentines\\"
  cntr = len([i for i in os.listdir(DIR) if image_type in i]) + 1
  print cntr
  f = open(DIR + image_type + "_"+ str(cntr)+".jpg", 'wb')
  f.write(raw_img)
  f.close()
rishabhr0y
  • 838
  • 1
  • 9
  • 14