26

This question has been asked numerous times before, but all answers are at least a couple years old and currently based on the ajax.googleapis.com API, which is no longer supported.

Does anyone know of another way? I'm trying to download a hundred or so search results, and in addition to Python APIs I've tried numerous desktop, browser-based, or browser-addon programs for doing this which all failed.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
xanderflood
  • 826
  • 2
  • 12
  • 22
  • 1
    Have you tried Selenium? – Morgan G Mar 05 '16 at 03:30
  • Selenium solved it! I used the code https://simplypython.wordpress.com/2015/05/18/saving-images-from-google-search-using-selenium-and-python/, with a slight change to the scrolling code. (Jumping directly to the bottom of the page does *not* necessarily cause a lazy-loaded page to load up all the images, so I made it scroll gradually.) – xanderflood Mar 05 '16 at 15:53
  • 1
    https://github.com/hardikvasa/google-images-download – hnvasa Apr 09 '18 at 04:29

10 Answers10

15

Make sure you install icrawler library first, use.

pip install icrawler
from icrawler.builtin import GoogleImageCrawler
google_Crawler = GoogleImageCrawler(storage = {'root_dir': r'write the name of the directory you want to save to here'})
google_Crawler.crawl(keyword = 'sad human faces', max_num = 800)
Ru Chern Chong
  • 3,692
  • 13
  • 33
  • 43
10

Use the Google Custom Search for what you want to achieve. See @i08in's answer of Python - Download Images from google Image search? it has great description, script samples and libraries references.

Andriy Ivaneyko
  • 20,639
  • 6
  • 60
  • 82
  • I'm accepting this because it definitely answers the question!I also want to point out that Google's APIs have restrictions designed to inhibit people using them for instance, to automate the collection of search results as I am trying to do, so this approach may run into permission issues. @Morgan G's suggestion to use Selenium worked great for me! – xanderflood Mar 05 '16 at 15:50
7

To download any number of images from Google image search using Selenium:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import json
import urllib2
import sys
import time

# adding path to geckodriver to the OS environment variable
# assuming that it is stored at the same path as this script
os.environ["PATH"] += os.pathsep + os.getcwd()
download_path = "dataset/"

def main():
    searchtext = sys.argv[1] # the search query
    num_requested = int(sys.argv[2]) # number of images to download
    number_of_scrolls = num_requested / 400 + 1 
    # number_of_scrolls * 400 images will be opened in the browser

    if not os.path.exists(download_path + searchtext.replace(" ", "_")):
        os.makedirs(download_path + searchtext.replace(" ", "_"))

    url = "https://www.google.co.in/search?q="+searchtext+"&source=lnms&tbm=isch"
    driver = webdriver.Firefox()
    driver.get(url)

    headers = {}
    headers['User-Agent'] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
    extensions = {"jpg", "jpeg", "png", "gif"}
    img_count = 0
    downloaded_img_count = 0

    for _ in xrange(number_of_scrolls):
        for __ in xrange(10):
            # multiple scrolls needed to show all 400 images
            driver.execute_script("window.scrollBy(0, 1000000)")
            time.sleep(0.2)
        # to load next 400 images
        time.sleep(0.5)
        try:
            driver.find_element_by_xpath("//input[@value='Show more results']").click()
        except Exception as e:
            print "Less images found:", e
            break

    # imges = driver.find_elements_by_xpath('//div[@class="rg_meta"]') # not working anymore
    imges = driver.find_elements_by_xpath('//div[contains(@class,"rg_meta")]')
    print "Total images:", len(imges), "\n"
    for img in imges:
        img_count += 1
        img_url = json.loads(img.get_attribute('innerHTML'))["ou"]
        img_type = json.loads(img.get_attribute('innerHTML'))["ity"]
        print "Downloading image", img_count, ": ", img_url
        try:
            if img_type not in extensions:
                img_type = "jpg"
            req = urllib2.Request(img_url, headers=headers)
            raw_img = urllib2.urlopen(req).read()
            f = open(download_path+searchtext.replace(" ", "_")+"/"+str(downloaded_img_count)+"."+img_type, "wb")
            f.write(raw_img)
            f.close
            downloaded_img_count += 1
        except Exception as e:
            print "Download failed:", e
        finally:
            print
        if downloaded_img_count >= num_requested:
            break

    print "Total downloaded: ", downloaded_img_count, "/", img_count
    driver.quit()

if __name__ == "__main__":
    main()

Full code is here.

atif93
  • 401
  • 4
  • 8
6

Improving a bit on Ravi Hirani's answer the simplest way is to go by this :

from icrawler.builtin import GoogleImageCrawler

google_crawler = GoogleImageCrawler(storage={'root_dir': 'D:\\projects\\data core\\helmet detection\\images'})
google_crawler.crawl(keyword='cat', max_num=100)

Source : https://pypi.org/project/icrawler/

Soumya Boral
  • 1,191
  • 14
  • 28
3

How about this one?

https://github.com/hardikvasa/google-images-download

it allows you to download hundreds of images and has a ton of filters to choose from to customize your search


If you would want to download more than 100 images per keyword, then you will need to install 'selenium' along with 'chromedriver'.

If you have pip installed the library or run the setup.py file, Selenium would have automatically installed on your machine. You will also need Chrome browser on your machine. For chromedriver:

Download the correct chromedriver based on your operating system.

On Windows or MAC if for some reason the chromedriver gives you trouble, download it under the current directory and run the command.

On windows however, the path to chromedriver has to be given in the following format:

C:\complete\path\to\chromedriver.exe

On Linux if you are having issues installing google chrome browser, refer to this CentOS or Amazon Linux Guide or Ubuntu Guide

For All the operating systems you will have to use '--chromedriver' or '-cd' argument to specify the path of chromedriver that you have downloaded in your machine.

Vishal Gupta
  • 805
  • 14
  • 15
hnvasa
  • 830
  • 4
  • 13
  • 26
  • 1
    this only allows up to 100 images to download – abggcv Apr 23 '18 at 10:40
  • Using chromedriver you can download hundreds of images fro the above library...it is not limited to just 100. Instructions are in the README file. :) – hnvasa Jun 24 '19 at 18:45
  • Is there some way to get this to stop skipping images that don't have an image format? (e.g. https://partycity6.scene7.com/is/image/PartyCity/_pdp_sq_?$_1000x1000_$&$product=PartyCity/237864) and instead to download them some other way? – Brandon Oct 14 '19 at 04:44
2

i have been using this script to download images from google search and i have been using them for my trainig my classifiers the code below can download 100 images related to the query

from bs4 import BeautifulSoup
import requests
import re
import urllib2
import os
import cookielib
import json

def get_soup(url,header):
    return BeautifulSoup(urllib2.urlopen(urllib2.Request(url,headers=header)),'html.parser')


query = raw_input("query image")# you can change the query for the image  here
image_type="ActiOn"
query= query.split()
query='+'.join(query)
url="https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch"
print url
#add the directory for your image here
DIR="Pictures"
header={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
}
soup = get_soup(url,header)


ActualImages=[]# contains the link for Large original images, type of  image
for a in soup.find_all("div",{"class":"rg_meta"}):
    link , Type =json.loads(a.text)["ou"]  ,json.loads(a.text)["ity"]
    ActualImages.append((link,Type))

print  "there are total" , len(ActualImages),"images"

if not os.path.exists(DIR):
            os.mkdir(DIR)
DIR = os.path.join(DIR, query.split()[0])

if not os.path.exists(DIR):
            os.mkdir(DIR)
###print images
for i , (img , Type) in enumerate( ActualImages):
    try:
        req = urllib2.Request(img, headers={'User-Agent' : header})
        raw_img = urllib2.urlopen(req).read()

        cntr = len([i for i in os.listdir(DIR) if image_type in i]) + 1
        print cntr
        if len(Type)==0:
            f = open(os.path.join(DIR , image_type + "_"+ str(cntr)+".jpg"), 'wb')
        else :
            f = open(os.path.join(DIR , image_type + "_"+ str(cntr)+"."+Type), 'wb')


        f.write(raw_img)
        f.close()
    except Exception as e:
        print "could not load : "+img
        print e
rishabhr0y
  • 838
  • 1
  • 9
  • 14
1

I'm trying this library that can be used as both: a command line tool or a python library. It has lots of arguments to find images with different criterias.

Those are examples taken from its documentation, to use it as a python library:

from google_images_download import google_images_download   #importing the library

response = google_images_download.googleimagesdownload()   #class instantiation

arguments = {"keywords":"Polar bears,baloons,Beaches","limit":20,"print_urls":True}   #creating list of arguments
paths = response.download(arguments)   #passing the arguments to the function
print(paths)   #printing absolute paths of the downloaded images

or as a commandline tool, as follows:

$ googleimagesdownload --k "car" -sk 'red,blue,white' -l 10

You can install this with pip install google_images_download

Rodrigo Laguna
  • 1,796
  • 1
  • 26
  • 46
1

A simple solution to this problem is to install a python package called google_images_download

pip install google_images_download

use this python code

from google_images_download import google_images_download  

response = google_images_download.googleimagesdownload()
keywords = "apple fruit"
arguments = {"keywords":keywords,"limit":20,"print_urls":True}
paths = response.download(arguments)
print(paths)

adjust the limit to control the no of images to download

but some images won't open as they might be corrupt

change the keywords String to get the output you need

Avin_ash
  • 61
  • 1
  • 2
0

You need to use the custom search API. There is a handy explorer here. I use urllib2. You also need to create an API key for your application from the developer console.

  • A better solution would be to wrap hardikvasa code up in an API by changing the code to be run from a class instead of a standalone python script. That way no API key is required. API keys are all well in good but they are just another block to testing. – Eamonn Kenny Mar 23 '18 at 13:07
0

I have tried many codes but none of them working for me. I am posting my working code here. Hope it will help others.

I am using Python version 3.6 and used icrawler

First, you need to download icrawler in your system.

Then run below code.

from icrawler.examples import GoogleImageCrawler
google_crawler = GoogleImageCrawler()
google_crawler.crawl(keyword='krishna', max_num=100)

Replace keyword krishna with your desired text.

Note:- Downloaded image needs path. Right now I used same directory where script placed. You can set custom directory via below code.

google_crawler = GoogleImageCrawler('path_to_your_folder')
Ravi Hirani
  • 6,511
  • 1
  • 27
  • 42