UPDATE: It turned out to be an inconsistency in the responses of the Instagram graphql (unofficial) API, which requires authentication for some IDs but does not for others for the same endpoint.
I am issuing GET requests against Instagram graphql endpoint. For some queries, the JSON response I get via Python requests module is inconsistent with what I get via a browser for the same query.
For example this URL returns a JSON object containing 10 users as expected:
https://www.instagram.com/graphql/query/?variables=%7B%22shortcode%22%3A+%22BYRWPzFHUfg8r_s9UMtd6BtoI01RPGmviXaskI0%22%2C+%22first%22%3A+10%7D&query_id=17864450716183058
But when I request the same URL via requests module like this:
import requests
url = 'https://www.instagram.com/graphql/query/?variables=%7B%22shortcode%22%3A+%22BYRWPzFHUfg8r_s9UMtd6BtoI01RPGmviXaskI0%22%2C+%22first%22%3A+10%7D&query_id=17864450716183058'
response = requests.get(url)
The returned value, i.e. response.text
is {"data": {"shortcode_media": null}, "status": "ok"}
, kinda empty response, which I suppose means something like the media ID did not match.
As a double check, this test of comparing the original URL with the URL of the final response holds true, showing that the URL is not changed by requests module in any way:
>>> response.url == url
True
This only happens for long media IDs such as BYRWPzFHUfg8r_s9UMtd6BtoI01RPGmviXaskI0
. For shorter IDs, e.g. BZx5Zx9nHwS
the response returned by the requests module is the same that is return via the browser as expected.
Rather than the length of the ID, I thought it may be a special character in the ID which is being encoded differently, such as the underscore. I tried encoding it with %5F but that didn't work neither.
Any ideas? Can it be a bug in the requests module?