4

I want to fetch google images against any query. I have gone through the google image search api but unable to understand. i have also seen some methods, they fetch images but only of first page.i have used following method.

function getGoogleImg($k)
{
    $url = "http://images.google.it/images?as_q=##query##&hl=it&imgtbs=z&btnG=Cerca+con+Google&as_epq=&as_oq=&as_eq=&imgtype=&imgsz=m&imgw=&imgh=&imgar=&as_filetype=&imgc=&as_sitesearch=&as_rights=&safe=images&as_st=y";
    $web_page = file_get_contents( str_replace("##query##",urlencode($k), $url ));
    $tieni = stristr($web_page,"dyn.setResults(");
    $tieni = str_replace( "dyn.setResults(","", str_replace(stristr($tieni,");"),"",$tieni) );
    $tieni = str_replace("[]","",$tieni);
    $m = preg_split("/[\[\]]/",$tieni);
    $x = array();
    for($i=0;$i<count($m);$i++)
    {
        $m[$i] = str_replace("/imgres?imgurl\\x3d","",$m[$i]);
        $m[$i] = str_replace(stristr($m[$i],"\\x26imgrefurl"),"",$m[$i]);
        $m[$i] = preg_replace("/^\"/i","",$m[$i]);
        $m[$i] = preg_replace("/^,/i","",$m[$i]);
        if ($m[$i]!="")
        array_push($x,$m[$i]);
   }
   return $x;
}

This function return only 21 images. i want all images against this query. i am doing this in php

Awais Qarni
  • 17,492
  • 24
  • 75
  • 137
  • Have you ever seen a google-resultpage that gives you all(can be millions) results? However, you better use the image-search-api : http://code.google.com/intl/de/apis/imagesearch/ – Dr.Molle Feb 12 '11 at 12:48
  • Yes Dr.Mollie. But when it returns, it returns only some of them. Not all. As we cannot scrap google images. – Awais Qarni Feb 12 '11 at 12:50
  • please show me one of those resultpages with all results(of course with more than 20 images) – Dr.Molle Feb 12 '11 at 12:53
  • How can I show you? I have used the above function that returns the images. The above function only returns src of 21 images – Awais Qarni Feb 12 '11 at 12:54
  • Simply post the URL of 1 google-imagesearch-resultpage with all/more than 20 images included(guess you know one, if you have seen some). – Dr.Molle Feb 12 '11 at 12:58
  • 4
    Google does not allow to use their search engine in such way. All automated requests should be performed via API. – zerkms Feb 12 '11 at 12:59
  • Look response this StackOverflow post : [Download first 1000 images from google search][1] [1]: http://stackoverflow.com/questions/11524218/download-first-1000-images-from-google-search/12424268 – LeMoussel Sep 14 '12 at 12:15

2 Answers2

3

Sadly the image API is being closed down, so I wont suggest moving to that, but that would have been a nicer solution I think.

My best guess is that image 22 and forwards is being loaded using som ajax/javascript of some sort (if you search for say logo and scroll down you will see placeholders that gets loaded as you move down) and that you need to pass the page by a javascript engine and that is not something that I can find anyone who have done with php (yet). Have you checked that $web_page contains more than 21 images (when I toy against google image search it uses javascript to load some of the images)? When you access the link from your normal browser what happens then and what happens if you turn off javascript? Is there perhaps a link to next page in the result you have?

In the now deprecated Image API there were ways to limit the number of results per page and ways to step to the next page https://developers.google.com/image-search/v1/jsondevguide#json_snippets_php

If you wish to keep on doing searches and fetching images from the search result then for later http://simplehtmldom.sourceforge.net/ might be a nice alternative to look at. It fetches a html DOM and allows you to easily find nodes and makes it easy to work with them. But it still uses file_get_contents or curl libraries to fetch the data so it might need some fiddling to get javascript working.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Jontas
  • 409
  • 3
  • 12
1

I wrote a script to download images form google Image search which I currently downloading 100 original images

The original script I wrote on stackoverflow answer

Python - Download Images from google Image search?

which I will explain in detail how I am scraping url’s of original Images from Google Image search using urllib2 and BeautifulSoup

For example if u want to scrape images of movie terminator 3 from google image search

query= "Terminator 3"
query=  '+'.join(query.split())  #this will make the query terminator+3
url="https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch"
header={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
}
req = urllib2.Request(url,headers=header)
soup= urllib2.urlopen(req)
soup = BeautifulSoup(soup)

variable soup above contains the html code of the page that is requested now we need to extract the images for that u have to open the web page in your browser and and do inspect element on the image

here you will find the the tags containing the image of the url

for example for google image i found "div",{"class":"rg_meta"} containing the link to image

You can search up the BeautifulSoup documentation

print soup.find_all("div",{"class":"rg_meta"})

You will get a list of results as

<div class="rg_meta">{"cl":3,"cr":3,"ct":12,"id":"C0s-rtOZqcJOvM:","isu":"emuparadise.me","itg":false,"ity":"jpg","oh":540,"ou":"http://199.101.98.242/media/images/66433-Terminator_3_The_Redemption-1.jpg","ow":960,"pt":"Terminator 3 The Redemption ISO \\u0026lt; GCN ISOs | Emuparadise","rid":"VJSwsesuO1s1UM","ru":"http://www.emuparadise.me/Nintendo_Gamecube_ISOs/Terminator_3_The_Redemption/66433","s":"Screenshot Thumbnail / Media File 1 for Terminator 3 The Redemption","th":168,"tu":"https://encrypted-tbn2.gstatic.com/images?q\\u003dtbn:ANd9GcRs8dp-ojc4BmP1PONsXlvscfIl58k9hpu6aWlGV_WwJ33A26jaIw","tw":300}</div>

the result above contains link to our image url

http://199.101.98.242/media/images/66433-Terminator_3_The_Redemption-1.jpg

You can extract these links and images as follows

ActualImages=[]# contains the link for Large original images, type of  image
for a in soup.find_all("div",{"class":"rg_meta"}):
    link , Type =json.loads(a.text)["ou"]  ,json.loads(a.text)["ity"]
    ActualImages.append((link,Type))

for i , (img , Type) in enumerate( ActualImages):
    try:
        req = urllib2.Request(img, headers={'User-Agent' : header})
        raw_img = urllib2.urlopen(req).read()
        if not os.path.exists(DIR):
            os.mkdir(DIR)
        cntr = len([i for i in os.listdir(DIR) if image_type in i]) + 1
        print cntr
        if len(Type)==0:
            f = open(DIR + image_type + "_"+ str(cntr)+".jpg", 'wb')
        else :
            f = open(DIR + image_type + "_"+ str(cntr)+"."+Type, 'wb')


        f.write(raw_img)
        f.close()
    except Exception as e:
        print "could not load : "+img
        print e

Voila now u can use this script to download images from google search. Or for collecting training images

For the fully working script you can get it here

https://gist.github.com/rishabhsixfeet/8ff479de9d19549d5c2d8bfc14af9b88

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
rishabhr0y
  • 838
  • 1
  • 9
  • 14