Python requests: Inconsistency in URL encoding?

Question

UPDATE: It turned out to be an inconsistency in the responses of the Instagram graphql (unofficial) API, which requires authentication for some IDs but does not for others for the same endpoint.

I am issuing GET requests against Instagram graphql endpoint. For some queries, the JSON response I get via Python requests module is inconsistent with what I get via a browser for the same query.

For example this URL returns a JSON object containing 10 users as expected:

https://www.instagram.com/graphql/query/?variables=%7B%22shortcode%22%3A+%22BYRWPzFHUfg8r_s9UMtd6BtoI01RPGmviXaskI0%22%2C+%22first%22%3A+10%7D&query_id=17864450716183058

But when I request the same URL via requests module like this:

import requests

url = 'https://www.instagram.com/graphql/query/?variables=%7B%22shortcode%22%3A+%22BYRWPzFHUfg8r_s9UMtd6BtoI01RPGmviXaskI0%22%2C+%22first%22%3A+10%7D&query_id=17864450716183058'
response = requests.get(url)

The returned value, i.e. response.text is {"data": {"shortcode_media": null}, "status": "ok"}, kinda empty response, which I suppose means something like the media ID did not match.

As a double check, this test of comparing the original URL with the URL of the final response holds true, showing that the URL is not changed by requests module in any way:

>>> response.url == url
True

This only happens for long media IDs such as BYRWPzFHUfg8r_s9UMtd6BtoI01RPGmviXaskI0. For shorter IDs, e.g. BZx5Zx9nHwS the response returned by the requests module is the same that is return via the browser as expected.

Rather than the length of the ID, I thought it may be a special character in the ID which is being encoded differently, such as the underscore. I tried encoding it with %5F but that didn't work neither.

Any ideas? Can it be a bug in the requests module?

Try `requests.get('https://www.instagram.com/graphql/query/', params={'variables': '.. actual, non-escaped string'})` instead. You can [also ask for the complete request to be logged](https://stackoverflow.com/questions/10588644/how-can-i-see-the-entire-http-request-thats-being-sent-by-my-python-application), so you can see what is actually being sent over the wire. — MatsLindh, Oct 07 '17 at 20:42
thanks for the comment and the idea. the exact same thing happens when i try with the non-escaped params as you suggested. the logging also show that the requested URL is the same as the intended. can you also try with the two different IDs to see if you can replicate the issue? — onurmatik, Oct 07 '17 at 21:12
Pasting the first URL you have in your post verbatim into Chrome gives an empty shortcode_media. Are you authenticated to Instagram in your browser, but not through requests? — MatsLindh, Oct 07 '17 at 21:31
Ah! yes that was the problem. The strange thing that the same query with a different ID (https://www.instagram.com/graphql/query/?variables=%7B%22shortcode%22%3A+%22BZx5Zx9nHwS%22%2C+%22first%22%3A+10%7D&query_id=17864450716183058) is working without authentication made me go to a totally wrong direction. — onurmatik, Oct 08 '17 at 02:56

Python requests: Inconsistency in URL encoding?

0 Answers0