5

I'm just starting to learn how to web scrape using BeautifulSoup and want to write a simple program that will get the follower count for a given Instagram page. I currently have the following script (pulled from another Q&A thread):

import requests
from bs4 import BeautifulSoup

user = "espn"
url = 'https://www.instagram.com/'+ user
r = requests.get(url)
soup = BeautifulSoup(r.content)
followers = soup.find('meta', {'name': 'description'})['content']
follower_count = followers.split('Followers')[0]
print(follower_count)

# 10.7m

The problem I am running into is I want to get a more precise number, which you can see when you hover the mouse over the follower count on the Instagram page (e.g., 10,770,816).

Unfortunately, I have not been able to figure out how to do this with BeautifulSoup. I'd like to do this without the API since I am combining this with code to track other social media platforms. Any tips?

Abraham
  • 8,525
  • 5
  • 47
  • 53
user10331423
  • 53
  • 1
  • 1
  • 4

5 Answers5

11

Use the API is the easiest way, but I also found a very hacky way to do it:

import requests

username = "espn"
url = 'https://www.instagram.com/' + username
r = requests.get(url).text

start = '"edge_followed_by":{"count":'
end = '},"followed_by_viewer"'
followers= r[r.find(start)+len(start):r.rfind(end)]

start = '"edge_follow":{"count":'
end = '},"follows_viewer"'
following= r[r.find(start)+len(start):r.rfind(end)]

print(followers, following)

If you look through the response requests gives, theres a line of Javascript that contains the real follower count:

...edge_followed_by":{"count":10770969},"followed_by_viewer":{...

So I just extracted the number by finding the substring before and after.

Smart Manoj
  • 5,230
  • 4
  • 34
  • 59
Sam
  • 608
  • 4
  • 11
4

Instagram always responds with JSON data, making it a usually cleaner option to obtain metadata from the JSON, rather than parsing the HTML response with BeautifulSoup. Given that using BeatifulSoup is not a constraint, there are at least two clean options to get the follower count of an Instagram profile:

  1. Obtain the profile page, search the JSON and parse it:

    import json
    import re
    import requests
    
    response = requests.get('https://www.instagram.com/' + PROFILE)
    json_match = re.search(r'window\._sharedData = (.*);</script>', response.text)
    profile_json = json.loads(json_match.group(1))['entry_data']['ProfilePage'][0]['graphql']['user']
    
    print(profile_json['edge_followed_by']['count'])
    

    Then, profile_json variable contains the profile's metadata, not only the follower count.

  2. Use a library, leaving changes of Instagram's responses the upstream's problem. There is Instaloader, which can be used liked this:

    from instaloader import Instaloader, Profile
    
    L = Instaloader()
    profile = Profile.from_username(L.context, PROFILE)
    
    print(profile.followers)
    

    It also supports logging in, allowing to access private profiles as well.

    (disclaimer: I am authoring this tool)

Either way, you obtain a structure containing the profile's metadata, without needing to do strange things to the html response.

aandergr
  • 627
  • 7
  • 13
1

Here is my approach ( the html source code has a json object that has all the data of the profile )

import json
import urllib.request, urllib.parse
from bs4 import BeautifulSoup   

req      = urllib.request.Request(myurl)
req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36')
html     = urllib.request.urlopen(req).read()
response = BeautifulSoup(html, 'html.parser')
jsonObject = response.select("body > script:nth-of-type(1)")[0].text.replace('window._sharedData =','').replace(';','')
data      = json.loads(jsonObject)
following = data['entry_data']['ProfilePage'][0]['graphql']['user']['edge_follow']['count']
followed  = data['entry_data']['ProfilePage'][0]['graphql']['user']['edge_followed_by']['count']
posts     = data['entry_data']['ProfilePage'][0]['graphql']['user']['edge_owner_to_timeline_media']['count']
username  = data['entry_data']['ProfilePage'][0]['graphql']['user']['edge_owner_to_timeline_media']['edges'][0]['node']['owner']['username']
Ahmed Soliman
  • 1,662
  • 1
  • 11
  • 16
0

Although this is not really a general question on programming, you should find that the exact follower count is the title property of the span element containing the formatted follower count. You can query this property.

Ulysse Mizrahi
  • 681
  • 7
  • 19
0

The easist method to do this would be to dump the page html into a text editor and do a text search for the exact number of followers the person has. You can then zero into the element which contains the number.

Mathew Savage
  • 168
  • 11