1

I am trying to scrape data from linkedin's public profiles using scrapy. However for every request i am getting 999 response code. I am using RandomUserAgentMiddleware to randomize the user agent strings.

Strange thing is i am not blocked by ip, since i am able to open linkedin in my browser. Are there any specific field i need to pass in my request header ?

I have tried using 'Accept-Encoding': 'gzip, deflate' in the request header following one of the stackoverflow's questions. But it still gave me 999 response code.

Edit:

If i manually set the USER_AGENT in the settings file it works but if i do it using the randomuseragent middleware it doesn't work. Even though the request headers are same in both cases.

Request header with RandomUserAgent middleware

{'Accept-Language': ['en-US,en;q=0.8'], 'Accept-Encoding': ['gzip, deflate, sdch, br'], 'Host': ['www.linkedin.com'], 'Accept': ['text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'], 'Upgrade-Insecure-Requests': ['1'], 'Connection': ['keep-alive'], 'User-Agent': ['Mozilla/5.0 (Linux; Android 5.1.1; SM-G928X Build/LMY47X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.83 Mobile Safari/537.36']}

Request header with manually setting user agent.

{'Accept-Language': ['en-US,en;q=0.8'], 'Accept-Encoding': ['gzip, deflate, sdch, br'], 'Host': ['www.linkedin.com'], 'Accept': ['text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'], 'Upgrade-Insecure-Requests': ['1'], 'Connection': ['keep-alive'], 'User-Agent': ['Mozilla/5.0 (Linux; Android 5.1.1; SM-G928X Build/LMY47X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.83 Mobile Safari/537.36']}
vigenere
  • 197
  • 3
  • 15
  • You might want to look at http://stackoverflow.com/a/27231544/5754656. The RandomUA is probably choosing some UAs that are blocked by LinkedIn. Randomly pick from hardcoded ones (e.g. Firefox, Chrome, IE, Edge, Safari and various versions of those) instead. – Artyer Aug 31 '16 at 18:09
  • I tried with various useragent strings but still got 999. I even tried with my browser's user agent but still didn't work. – vigenere Aug 31 '16 at 18:10
  • How are you setting the UA? Give some example code as that may be why it isn't working. – Artyer Aug 31 '16 at 18:11
  • I add it to the useragents.txt file that is used by the RandomUA middleware. – vigenere Aug 31 '16 at 18:13
  • 2
    Pretty sure Linkedin do not want you scraping their site, why not use the API? – Padraic Cunningham Aug 31 '16 at 20:32
  • 1
    Possible duplicate of [999 Error Code on HEAD request to LinkedIn](https://stackoverflow.com/questions/27231113/999-error-code-on-head-request-to-linkedin) – Caleb Bell Jun 05 '17 at 13:58

0 Answers0