4

I have a requirement where i have to reverse lookup an image on google and extract the name printed on the "Best guess for this image:" title. No i did some modifications to an existing curl code on the net and came this far:

<?php

function fetch_google($terms="sample search",$numpages=1,$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')  
{
    $searched="";
    for($i=0;$i<=$numpages;$i++)
    {
        $ch = curl_init();
        $url="http://www.google.com/searchbyimage?hl=en&image_url=".urlencode($terms);
        curl_setopt ($ch, CURLOPT_URL, $url);
        curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
        curl_setopt ($ch, CURLOPT_HEADER, 0);
        curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
        curl_setopt ($ch,CURLOPT_CONNECTTIMEOUT,120);
        curl_setopt ($ch,CURLOPT_TIMEOUT,120);
        curl_setopt ($ch,CURLOPT_MAXREDIRS,10);
        curl_setopt ($ch,CURLOPT_COOKIEFILE,"cookie.txt");
        curl_setopt ($ch,CURLOPT_COOKIEJAR,"cookie.txt");
        $searched=$searched.curl_exec ($ch);
        curl_close ($ch);
    }

    $xml = new DOMDocument();
    @$xml->loadHTML($searched);
    foreach($xml->getElementsByTagName('div') as $div)
    {
        if(strpos($div->nodeValue,"Best guess for this image:"))
            return $div->nodeValue;
    } 
}

$content = fetch_google("http://media.il.edmunds-media.com/aston-martin/as/03/de/aston-martin_front_03-de-as_1_276.jpg",1);
echo $content."<br>";

?>

but it gives me lots of text and i am not able to get the exact div for it. since the 'a' does not have a class attribute i had to do it this way.

Please help!

Suyash
  • 625
  • 1
  • 5
  • 22
  • Are you able to take a look at [this question](http://stackoverflow.com/q/14953867/1311910) with similar context to yours, and shed some light on how to solve the issue please? – 7usam Feb 19 '13 at 22:48

2 Answers2

3

You could use preg_match instead.

As you're getting the HTML back from CURL, you can then use Regex to match the text instead:

function fetch_google($terms="sample search",$numpages=1,$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')  
{
    $searched="";
    for($i=0;$i<=$numpages;$i++)
    {
        $ch = curl_init();
        $url="http://www.google.com/searchbyimage?hl=en&image_url=".urlencode($terms);
        curl_setopt ($ch, CURLOPT_URL, $url);
        curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
        curl_setopt ($ch, CURLOPT_HEADER, 0);
        curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
        curl_setopt ($ch,CURLOPT_CONNECTTIMEOUT,120);
        curl_setopt ($ch,CURLOPT_TIMEOUT,120);
        curl_setopt ($ch,CURLOPT_MAXREDIRS,10);
        curl_setopt ($ch,CURLOPT_COOKIEFILE,"cookie.txt");
        curl_setopt ($ch,CURLOPT_COOKIEJAR,"cookie.txt");
        $searched=$searched.curl_exec ($ch);
        curl_close ($ch);
    }

    $matches = array();
    preg_match('/Best guess for this image:[^<]+<a[^>]+>([^<]+)/', $searched, $matches);
    return (count($matches) > 1 ? $matches[1] : false);
}
Gavin
  • 6,284
  • 5
  • 30
  • 38
  • It Works! But I always read on stackoverflow not to use regex, instead only to use dom. – Suyash May 17 '12 at 11:53
  • 3
    Each have their pros and cons, but in this situation I would always use Regex. DOMDocument is fine but in situations where you are loading external content that you have no control over, any mistake they make will break your code. Regex is extremely flexible, so much so, the example I gave is simply looking for "Best guess for this image: aston martin vantage" it doesn't care about the rest of the content. Of course, they could change this, but when that happens, it will take you far longer to update your DOMDocument method than the Regex method. HTH – Gavin May 17 '12 at 12:02
2

If you want to use DOMDocument you can get the values with the following modification.

    <?php

function fetch_google($terms="sample search",$numpages=1,$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')  
{
    $searched="";
    for($i=0;$i<=$numpages;$i++)
    {
        $ch = curl_init();
        $url="http://www.google.com/searchbyimage?hl=en&image_url=".urlencode($terms);
        curl_setopt ($ch, CURLOPT_URL, $url);
        curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
        curl_setopt ($ch, CURLOPT_HEADER, 0);
        curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
        curl_setopt ($ch,CURLOPT_CONNECTTIMEOUT,120);
        curl_setopt ($ch,CURLOPT_TIMEOUT,120);
        curl_setopt ($ch,CURLOPT_MAXREDIRS,10);
        curl_setopt ($ch,CURLOPT_COOKIEFILE,"cookie.txt");
        curl_setopt ($ch,CURLOPT_COOKIEJAR,"cookie.txt");
        $searched=$searched.curl_exec ($ch);
        curl_close ($ch);
    }

    $xml = new DOMDocument();
    @$xml->loadHTML($searched);
    if(true == ($topblock = $xml->getElementByID('topstuff')))
    {

        foreach($topblock->getElementsByTagName('div') as $div){

            if(strstr(strtolower($div->nodeValue), "guess")){
                foreach($div->getElementsByTagName('a') as $a){
                    $last = $a->nodeValue;
                }
            }
        }
    }

    return $last; 
}

$content = fetch_google($_GET['img'],1);
echo $content."<br>";

?>