1

does anyone know if it is possible to display google page rank of a particular website using php script?

if it is possible, how do i do it?

Smith
  • 5,765
  • 17
  • 102
  • 161
  • Are you referring to PageRank (PR) or the actual location of a listing in the results? – Cody Snider Aug 28 '10 at 04:07
  • you should read http://stackoverflow.com/faq#howtoask and conside to mark some of answers on your questions as ***accepted** and use also **voting** to vote up all answers which was helpful you you. – Oleg Aug 28 '10 at 10:36
  • @Oleg: most of the questions did not give me the desired answer i was looking for. should i mark it as unhelpful? @Cody Snider: am reffering to the actual pagerang (PR) number between 0 - 10 – Smith Aug 28 '10 at 12:18

1 Answers1

2

Okay, i re-wrote my Answer and extracted only the relevant part of my SEO Helper (my previous version had other stuff like Alexa Rank, Google Index, Yahoo Links etc in it. If you are looking for that, just see check an older revision of this answer!)

Please be aware that there are pages that have NO PAGERANK and by no I DON'T MEAN ZERO. There is just none. This may be because the page is so very unimportant (even less im portant than PR 0) or just so new but might very well be important. This is consiedered the same as PR 0 in my class!

This has some pros and some cons. If possible you should handle it seperately in your logic, but this is not always possible, so 0 is the next best approach.

Furthermore:

This code is reverse engeneered and does not utilize some sort of API that has any form of SLA or whatever. So it might stop working ANY TIME!

And PLEASE DONT FLOOD GOOGLE!

I made the test. If you have only a very short period of sleep, google blocks you after 1000 requests (for quite some time!). With a random sleep between 1.5 and 2 secs it looks fine.

I once crawled the pagerank for 70k pages. Only once, because I just needed it. I did only 5k a day from several IPs and now i have the data and It doesnt get outdated because the pages are there for decades.

IMO its totally OK to check a pagerank once in a while or even some at once, but dont miss-use this code or google may lock us out all together!

<?php
/*
 * @author Joe Hopfgartner <joe@2x.to>
 */
class Helper_Seo
{

    protected function _pageRankStrToNum($Str,$Check,$Magic) {
        $Int32Unit=4294967296;
        // 2^32
        $length=strlen($Str);
        for($i=0;$i<$length;$i++) {
            $Check*=$Magic;
            //If the float is beyond the boundaries of integer (usually +/- 2.15e+9 = 2^31),
            // the result of converting to integer is undefined
            if($Check>=$Int32Unit) {
                $Check=($Check-$Int32Unit*(int)($Check/$Int32Unit));
                //if the check less than -2^31
                $Check=($Check<-2147483648)?($Check+$Int32Unit):$Check;
            }
            $Check+=ord($Str {
                $i
            });
        }
        return $Check;
    }
    /* 
    * Genearate a hash for a url
    */
    protected function _pageRankHashURL($String) {
        $Check1=self::_pageRankStrToNum($String,0x1505,0x21);
        $Check2=self::_pageRankStrToNum($String,0,0x1003F);
        $Check1>>=2;
        $Check1=(($Check1>>4)&0x3FFFFC0)|($Check1&0x3F);
        $Check1=(($Check1>>4)&0x3FFC00)|($Check1&0x3FF);
        $Check1=(($Check1>>4)&0x3C000)|($Check1&0x3FFF);
        $T1=(((($Check1&0x3C0)<<4)|($Check1&0x3C))<<2)|($Check2&0xF0F);
        $T2=(((($Check1&0xFFFFC000)<<4)|($Check1&0x3C00))<<0xA)|($Check2&0xF0F0000);
        return($T1|$T2);
    }
    /* 
    * genearate a checksum for the hash string
    */
    protected function CheckHash($Hashnum) {
        $CheckByte=0;
        $Flag=0;
        $HashStr=sprintf('%u',$Hashnum);
        $length=strlen($HashStr);
        for($i=$length-1;$i>=0;$i--) {
            $Re=$HashStr {
                $i
            };
            if(1===($Flag%2)) {
                $Re+=$Re;
                $Re=(int)($Re/10)+($Re%10);
            }
            $CheckByte+=$Re;
            $Flag++;
        }
        $CheckByte%=10;
        if(0!==$CheckByte) {
            $CheckByte=10-$CheckByte;
            if(1===($Flag%2)) {
                if(1===($CheckByte%2)) {
                    $CheckByte+=9;
                }
                $CheckByte>>=1;
            }
        }
        return '7'.$CheckByte.$HashStr;
    }
    public static function getPageRank($url) {
        $fp=fsockopen("toolbarqueries.google.com",80,$errno,$errstr,30);
        if(!$fp) {
            trigger_error("$errstr ($errno)<br />\n");
            return false;
        }
        else {
            $out="GET /search?client=navclient-auto&ch=".self::CheckHash(self::_pageRankHashURL($url))."&features=Rank&q=info:".$url."&num=100&filter=0 HTTP/1.1\r\n";
            $out.="Host: toolbarqueries.google.com\r\n";
            $out.="User-Agent: Mozilla/4.0 (compatible; GoogleToolbar 2.0.114-big; Windows XP 5.1)\r\n";
            $out.="Connection: Close\r\n\r\n";
            fwrite($fp,$out);
            #echo " U: http://toolbarqueries.google.com/search?client=navclient-auto&ch=".$this->CheckHash($this->_pageRankHashURL($url))."&features=Rank&q=info:".$url."&num=100&filter=0";
            #echo "\n";
            //$pagerank = substr(fgets($fp, 128), 4);
            //echo $pagerank;
            #echo "DATA:\n\n";
            $responseOK = false;
            $response = "";
            $inhead = true;
            $body = "";
            while(!feof($fp)) {

                $data=fgets($fp,128);

                if($data == "\r\n" && $inhead) {
                    $inhead = false;
                } else {
                    if(!$inhead) {
                        $body.= $data;
                    }
                }

                //if($data == '\r\n\r\n')
                $response .= $data;
                if(trim($data) == 'HTTP/1.1 200 OK') {
                    $responseOK = true;
                } 

                #echo "D ".$data;
                $pos=strpos($data,"Rank_");
                if($pos===false) {
                }
                else {
                    $pagerank=trim(substr($data,$pos+9));
                    if($pagerank === '0') {
                            fclose($fp);
                            return 0;
                    } else if(intval($pagerank) === 0) {
                        throw new Exception('couldnt get pagerank from string: '.$pagerank);
                        //trigger_error('couldnt get pagerank from string: '.$pagerank);
                        fclose($fp);
                        return false;
                    } else {
                        fclose($fp);
                        return intval( $pagerank );
                    }
                }
            }
            fclose($fp);


            //var_dump($body);
            if($responseOK && $body=='') {
                return 0;
            }
            //return 0;
            throw new Exception('couldnt get pagerank, unknown error. probably google flood block. my tests showed that 1req/sec is okay! i recommend a random sleep between 1.5 and 2 secs. no sleep breaks at ~1000 reqs.');
            //trigger_error('couldnt get pagerank, unknown error. probably google flood block.');
            return false;
        }
    }

}
$url = "http://www.2xfun.de/";
$pagerank = Helper_Seo::getPagerank($url);
var_dump($pagerank); 
?>
The Surrican
  • 29,118
  • 24
  • 122
  • 168
  • +1 for code body too long for me to wantt o read through to verify its actually correct – Chris Aug 28 '10 at 13:43
  • lol :D i think it doesnt go any shorter. not with the proper error handling (ignoring the 3 lil functions at the end of the class that are just there and dont need to be read.. i left them there as well because they might be of use as well to sb who wants to know the pr) – The Surrican Aug 28 '10 at 14:13
  • Hello, is there any way you can let me know about the api. and if google blocks it i hope they don't block the website the call is calling from frim using any google services, cause i use alot of google services. is this a violation sort of? – Smith Sep 13 '10 at 15:16
  • 1
    there IS NO API. the provided code is reverse engeneered from the google toolbar. AFAIk the ip is blocked to access this special service after ~1000 requests without delay. 1 sec delay seems to work fine. i would recommend a dynamic delay of 2-3 seconds. I don't think it will affect your sites google ranking or something, however its impossible to say that and I wouldnt recommend doing it from your webservers ip. I however would strongly recommend not to do any form of batch crawling from the ip that serves your website. – The Surrican Sep 13 '10 at 15:39