Python 3.5.2 Iterating a get request

Question

Hoping someone can tell me whether this script is functioning the way I intended it to, and if not explain what I am doing wrong.

The RESTful API I am using has a parameter pageSize ranging from 10-50. I used pageSize=50. There was another parameter that I did not use called pageNumber

So, I thought this would be the right way to make the get request:

# Python 3.5.2
import requests

r = requests.get(url, stream=True)
with open("file.txt",'w', newline='', encoding='utf-8') as fd:
    text_out = r.text        
    fd.write(text_out)

UPDATE
I think I understand a bit better. I read the documentation in more detail, but I am still missing how to get the entire data set from the API. Here is some more information:

verbs = requests.options(r.url)
print(verbs.headers)
{'Server': 'ninx', 'Date': 'Sat, 24 Dec 2016 22:50:13 GMT',
'Allow': 'OPTIONS,HEAD,GET', 'Content-Length': '0', 'Connection': 'keep-alive'}
print(r.headers)
{'Transfer-Encoding': 'chunked', 'Vary': 'Accept-Encoding',
'X-Entity-Count': '50', 'Connection': 'keep-alive', 
'Content-Encoding': 'gzip', 'Date': 'Sat, 24 Dec 2016 23:59:07 GMT', 
'Server': 'ninx', 'Content-Type': 'application/json; charset=UTF-8'}

Should I create a session and use the previously unused pageNumber parameter to create a new url until the 'X-Entity-Count' is zero? Or, is there a better way?

`chunkSize` is purely about the number of raw bytes you're reading from the response at a time. It's not related to whatever higher level pieces (pages, objects, whatever) the api is returning to you. — pvg, Dec 23 '16 at 00:09
Thank you, does that mean I should remove the chuck_size=50 and the request should iterate correctly through the multiple pages? — BSCowboy, Dec 23 '16 at 00:11
Without the chunksize, you'll just read the whole contents of the response into memory and write it to a file. With the cunksize, you are doing it 50 bytes at a time. None of this has anything to do with the pages. It's not clear why you care about the pages anyway since you're writing the whole thing into a file anyway. — pvg, Dec 23 '16 at 00:12
I expected more results, the first few lines of the JSON has a "count": 25595. — BSCowboy, Dec 23 '16 at 00:18
Right. You're assuming `chunksize` is in some way related to the JSON or whatever other entities the request is returning. It isn't. That's really all there is to it. If you want to be iterating over things in the JSON structure, you should be parsing the JSON - there are methods in the requests library for that, review the docs. Your current implementation is basically looking at a raw bag of bytes. It doesn't know JSON or pages from Adam. — pvg, Dec 23 '16 at 00:20
Thanks. I will take out the loop and re-run it to see if the response changes. I don't have a problem parsing the JSON. Just unfamiliar and quite ignorant of interacting with APIs. — BSCowboy, Dec 23 '16 at 00:28

score 0 · Accepted Answer · edited May 23 '17 at 12:33

0

I found a discussion that helped clear this matter up for me...this updated question should probably be deleted... API pagination best practices

edited May 23 '17 at 12:33

Community

1
1

answered Dec 25 '16 at 22:16

BSCowboy

317
4
13

Python 3.5.2 Iterating a get request

1 Answers1