I'm downloading data from polygon api and after checking the documentation, I realized that there is some kind of a rate limit in terms of response size which will consist of 5000 records per request. Let's say I need to download several months worth of data, it looks like there is no one-liner solution that fetches all the data for the specified period at once.
Here's what the response looks like for 4 day data points that I get using requests.get('query').json()
:
{
"ticker":"AAPL",
"status":"OK",
"queryCount":4,
"resultsCount":4,
"adjusted":True,
"results":[
{
"v":152050116.0,
"vw":132.8458,
"o":132.76,
"c":134.18,
"h":134.8,
"l":130.53,
"t":1598932800000,
"n":1
},
{
"v":200117202.0,
"vw":131.6134,
"o":137.59,
"c":131.4,
"h":137.98,
"l":127,
"t":1599019200000,
"n":1
},
{
"v":257589206.0,
"vw":123.526,
"o":126.91,
"c":120.88,
"h":128.84,
"l":120.5,
"t":1599105600000,
"n":1
},
{
"v":336546289.0,
"vw":117.9427,
"o":120.07,
"c":120.96,
"h":123.7,
"l":110.89,
"t":1599192000000,
"n":1
}
],
"request_id":"bf5f3d5baa930697621b97269f9ccaeb"
}
I thought the fastest way is to write the content as is and process it later
with open(out_file, 'a') as out:
out.write(f'{response.json()["results"][0]}\n')
And later after I download what I needed, will read the file and convert the data to a json file using pandas:
pd.DataFrame([eval(item) for item in open('out_file.txt')]).to_json('out_file.json')
Is there a better way of achieving the same thing? If anyone is familiar with scrapy feed exports, is there a way of dumping the data to json file during the run without saving anything to memory which i think is the same fashion as scrapy operates.