1

I need to fetch basic profile data (complete page - html) of Linkedin profile. I tried python packages such as beautifulsoup but I get access denied.

I have generated the api tokens for linkedIn, but I am not sure how to incorporate those into the code.

Basically, I want to automate the process of scraping by just providing the company name.

Please help. Thanks!

Gaurav Chavan
  • 39
  • 2
  • 7
  • I know this was posted a year ago but my work around to getting data from LinkedIn without using the API was by using selenium to login and navigate to the desired page and then taking the html from the page (using beautiful soup) which I could then pull the data from. – EatSleepCode Jun 12 '19 at 17:20

1 Answers1

0

Beautiful Soup is a web scraper. Typically, people use this library to parse data from public websites or websites that don't have APIs. For example, you could use it to scrape the top 10 Google Search results.

Unlike web scrapers, a API lets you retrieve data behind non-public websites. Furthermore, it returns the data in a easily readable XML or JSON format, so you don't have to "scrape" a HTML file for the specific data you care about.

To make a API call to LinkedIn, use need to use a python HTTP request library. See this stackoverflow post for examples.

Take a look at Step 4 of the LinkedIn API documentation. It shows a sample HTTP GET call.

GET /v1/people/~ HTTP/1.1 Host: api.linkedin.com Connection: Keep-Alive Authorization: Bearer AQXdSP_W41_UPs5ioT_t8HESyODB4FqbkJ8LrV_5mff4gPODzOYR

Note that you also need to send a "Authorization" header along with HTTP GET call. This is where your token would go. You're probably getting an access denied right now because you didn't set this header in your request.

Here's an example of how you would add that header to a request with the requests library.

And that should be it. When you make that request, it should return a XML or JSON that has the data you want. You can use an XML or JSON parser to get the specific fields you want.

nareddyt
  • 1,016
  • 10
  • 20