3

I am trying to pull the the number of followers from a list of Instagram accounts. I have tried using the "find" method within Requests, however, the string that I am looking for when I inspect the actual Instagram no longer appears when I print "r" from the code below.

Was able to get this code to run successfully find the past, however, will no longer run. Webscraping Instagram follower count BeautifulSoup

import requests

user = "espn"
url = 'https://www.instagram.com/' + user
r = requests.get(url).text

start = '"edge_followed_by":{"count":'
end = '},"followed_by_viewer"'

print(r[r.find(start)+len(start):r.rfind(end)])

I receive a "-1" error, which means the substring from the find method was not found within the variable "r".

JMH
  • 193
  • 2
  • 4
  • 16

2 Answers2

4

I think it's because of the last ' in start and first ' in end...this will work:

import requests
import re

user = "espn"
url = 'https://www.instagram.com/' + user
r = requests.get(url).text
followers = re.search('"edge_followed_by":{"count":([0-9]+)}',r).group(1)

print(followers)

'14061730'
Derek Eden
  • 4,403
  • 3
  • 18
  • 31
  • 2
    Thanks, I actually get an "AttributeError: 'NoneType' object has no attribute 'group'" error. After doing some research, seems as though this comes following a "No Results" search. I'm now curious if it has something to do with my cache and Requests - as I have pinged the site a couple hundred times earlier today. Any thoughts on if that makes sense? For example, when I read the HTML text, I no longer see the search strings I shared above, however, can see them when I inspect online through Chrome. – JMH Oct 19 '19 at 23:42
  • to be honest I don't have alot of experience/knowledge on this. I can say I have had my ip address temporarily blocked from websites before from doing the same thing so I would not doubt it :D – Derek Eden Oct 20 '19 at 00:59
  • 1
    Thanks again - worked - believe my IP was temporarily blocked, as worked this AM. – JMH Oct 20 '19 at 16:12
  • Hi @JMH how did you fix your error? I get the same error when I run on the server, but it works fine on my PC? P.S. I can log in fine with the server via selenium btw, so I do have access to instagram. It seems soemthing breaks when I access via requests. – Newskooler Jun 10 '20 at 02:47
0

I want to suggest an updated solution to this question, as the answer of Derek Eden above from 2019 does not work anymore, as stated in its comments.

The solution was to add the r' before the regular expression in the re.search like so:

follower_count = re.search(r'"edge_followed_by\\":{\\"count\\":([0-9]+)}', response).group(1)

This r'' is really important as without it, Python seems to treat the expression as regular string which leads to the query not giving any results.

Also the instagram page seems to have backslashes in the object we look for at least in my tests, so the code example i use is the following in Python 3.10 and working as of July 2022:

# get follower count of instagram profile
import os.path
import requests
import re
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

# get instagram follower count
def get_instagram_follower_count(instagram_username):
    url = "https://www.instagram.com/" + instagram_username
    filename = "instagram.html"

    try:
        if not os.path.isfile(filename):
            r = requests.get(url, verify=False)
            print(r.status_code)
            print(r.text)
            response = r.text

            if not r.status_code == 200:
                raise Exception("Error: " + str(r.status_code))
            
            with open(filename, "w") as f:
                f.write(response)

        else:
            with open(filename, "r") as f:
                response = f.read()
                # print(response)

        follower_count = re.search(r'"edge_followed_by\\":{\\"count\\":([0-9]+)}', response).group(1)
        return follower_count

    except Exception as e:
        print(e)
        return 0


print(get_instagram_follower_count('your.instagram.profile'))

The method returns the follower count as expected. Please note that i added a few lines to not hammer Instagrams webserver and get blocked while testing by just saving the response in a file.

This is a slice of the original html content that contains the part we are looking for:

... mRL&s=1\",\"edge_followed_by\":{\"count\":110070},\"fbid\":\"1784 ...

I debugged the regex in regexr, it seems to work just fine at this point in time.

There are many posts about the regex r prefix like this one

Also the documentation of the re package shows clearly that this is the issue with the code above.

mwx
  • 91
  • 1
  • 4