8

I want to access my publicly available LinkedIn page. On my local machine, following code works:

import requests
url = "http://de.linkedin.com/pub/ankush-shah/73/9/982"
html = requests.get(url).text
print html

And it gives the correct html of my profile.

But when I execute the same code on my Heroku server, I (guess) am redirected to somewhere and gets this html.

Also, when I try with urllib2 on the heroku server:

import urllib2
url = "http://de.linkedin.com/pub/ankush-shah/73/9/982"
u = urllib2.urlopen(url)

This throws an urllib2.HTTPError: HTTP Error 999: Request denied

As I am using virtualenv, all the libraries on my local machine are exactly similar to the one installed on heroku server. Does LinkedIn blocks HTTP requests from servers like Heroku? Any help/suggestions would be appreciated.

Ankush Shah
  • 938
  • 8
  • 13
  • Why not test for this directly ? Change the user agent on the request on the Heroku server to match the user agent from the other machine. – dilbert May 24 '14 at 09:47
  • You mean something like this: requests.get(url, headers={'User-agent': 'Mozilla/5.0'}).text This works on my local machine but still not on heroku. – Ankush Shah May 24 '14 at 10:07
  • There's no platform information in that user agent string. Try a string from [here](http://www.useragentstring.com/pages/Firefox/). – dilbert May 24 '14 at 10:19
  • I tried couple of strings from there but still no luck. – Ankush Shah May 24 '14 at 10:24
  • 1
    Hang on. If Heroku is a hosted service, it has a static IP range (probably). Perhaps LinkedIn has IP blocked Heroku itself. This means you might need to proxy (or not use Heroku). – dilbert May 24 '14 at 10:26
  • Yes, you are right. LinkedIn do not allows for such requests: https://developer.linkedin.com/forum/heroku-requests-return-999 – Ankush Shah May 24 '14 at 12:02
  • You should post that as the answer. – dilbert May 24 '14 at 12:05

1 Answers1

7

As mention here, LinkedIn do not allow direct access. They have blacklisted Heroku's IP address and the only way to access the data is to use their APIs.

Ankush Shah
  • 938
  • 8
  • 13
  • @Ankush_Shah : Did they remove the Ip-Adress from their blacklist after a while ? – mmx73 May 29 '15 at 15:01
  • 1
    i am not aware of it as I switched to use their API which is way better than directly scrapping data. so i doubt if they have any reason for removing the blacklisted ip addresses. – Ankush Shah May 30 '15 at 08:59