Scrape Google images based on search term

Question

I wish to scrape all the images shown in the following URL: happiness

I tried many ways but I am able to fetch only 20 images. Below is the code in Python for the same:

query = input("happiness")# you can change the query for the image  here
image_type="ActiOn"
query= query.split()
query='+'.join(query)
url="https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch"
print(url)
#add the directory for your image here
DIR="Pictures"
header={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
}
soup = get_soup(url,header)
if not os.path.exists(DIR):
        os.mkdir(DIR)
DIR = os.path.join(DIR, query.split()[0])

if not os.path.exists(DIR):
        os.mkdir(DIR)

images = [a['src'] for a in soup.find_all("img", {"src": 
re.compile("gstatic.com")})]
print(images)
print("there are total" , len(images),"images")
image_type = "Action"
#print images
for img in images:
raw_img = urlopen(img).read()
#add the directory for your image here 
DIR="C:\\Users\\dhvani\\Pictures\\"+query+"\\"
cntr = len([i for i in os.listdir(DIR) if image_type in i]) + 1
print(cntr)
f = open(DIR + image_type + "_"+ str(cntr)+".jpg", 'wb')
f.write(raw_img)
f.close()

Can anybody help me to extract all the images?

Also the image downloaded are of less resolution. How do I download them with better resolution? — Dhvani Shah, Jul 26 '17 at 16:35
https://stackoverflow.com/questions/34035422/google-image-search-says-api-no-longer-available — Mayank, Jul 26 '17 at 17:12

score 5 · Answer 1 · answered Dec 12 '18 at 03:09

5

We build a solution to solve Google Image scraping . SerpAPI is a web service to convert google image results into JSON. We provide an extension for all the most popular platform: Python, Ruby, Java, NodeJS etc...

answered Dec 12 '18 at 03:09

jvmvik

319
1
3
9

score 1 · Answer 2 · answered Dec 12 '18 at 04:26

Google images returns only 20 images, subsequent results are loaded as we scroll. To control which 20 results are returned, you can use the start parameter in the url.

For example, this will print image urls for the number of results you specify

import requests
from bs4 import BeautifulSoup

num_res = 400
for start in range(0, num_res, 20):
    base_url ="https://www.google.co.in/search?q=happiness&source=lnms&tbm=isch&start={}"
    r = requests.get(base_url.format(start))
    soup = BeautifulSoup(r.content, 'lxml')
    print([[res.get('src') for res in child.findAll('img')] for child in soup.html.body.table.children][3])

This answer is just to satiate your curiosity, the ideal way to do this is via google search apis

Scrape Google images based on search term

2 Answers2