2

I am doing web scraping with curl for a linkedin profile page. If we try to extract data from this(http://in.linkedin.com/in/ratneshdwivedi) URL which is public, it's working. When I am logged in to linkedin and trying to harvesting data from this URL(http://www.linkedin.com/profile/view?id=77597832&locale=en_US&trk=tyah2&trkInfo=tas%3Aravi%20kant%20mishra%2Cidx%3A1-1-1) it's not working and instead returns blank data.

The following is my Source Code:

$html= $this->_getScrapingData ('http://in.linkedin.com/in/ratneshdwivedi',10);
preg_match("/<span class=\"full-name\">(.*)<\/span>/i", $html, $match);

 private function _getScrapingData($url,$timeout) {
        $ch = curl_init($url); // initialize curl with given url
        curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set  useragent
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds to execute
        curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error
        return @curl_exec($ch);

    }   

Thanks in advance

Nathan
  • 116
  • 9
ratnesh dwivedi
  • 352
  • 2
  • 8

3 Answers3

2

Your script is not using the same cookies as your browser. You need to go through the login form with your script first.

Use

CURLOPT_COOKIEJAR
CURLOPT_COOKIEFILE

to keep the cookies through your requests.

Patrick
  • 922
  • 11
  • 22
1

Is your script authenticating?

The page you have linked can only be viewed once logging in, this would explain why your script returns empty data as the full-name span class does not exist on the redirected login page.

You may also want to check http://developer.linkedin.com/documents/profile-api as there are far nicer ways of accomplishing this than scraping pages.

Nathan
  • 116
  • 9
1

I think the problem is that you are loggedin in your browser (I guess your browser has cookie with some session id) but when you call curl it doesn't know anything about your cookies.

The solution would be to first call login request with your credentials and save received cookies from linkedin. Then call all requests you want with appropriate cookies. Just google how to send cookies via PHP curl, I'm sure someone asked this before.

Btw., I think linkedin has some API that you can use instead.

martin
  • 93,354
  • 25
  • 191
  • 226