I'm pulling data from a rest API. The problem is that the datasize is huge and so the response is paginated. I've gotten around it by first reading how many pages of data there are and then iterating the request for each page. The only problem here is that the total number of pages are around 1.5K, which take a huge amount of time to actually fetch and append to a CSV. Is there any faster workaround for this?
This is the endpoint I'm targeting: https://developer.keeptruckin.com/reference#get-logs
import requests
import json
import csv
url='https://api.keeptruckin.com/v1/logs?start_date=2019-03-09'
header={'x-api-key':'API KEY HERE'}
r=requests.get(url,headers=header)
result=r.json()
result = json.loads(r.text)
num_pages=result['pagination']['total']
print(num_pages)
for page in range (2,num_pages+1):
r=requests.get(url,headers=header, params={'page_no': page})
result=r.json()
result = json.loads(r.text)
csvheader=['First Name','Last Name','Date','Time','Type','Location']
with open('myfile.csv', 'a+', newline='') as csvfile:
writer = csv.writer(csvfile, csv.QUOTE_ALL)
##writer.writerow(csvheader)
for log in result['logs']:
username = log['log']['driver']['username']
first_name=log['log']['driver']['first_name']
last_name=log['log']['driver']['last_name']
for event in log['log']['events']:
start_time = event['event']['start_time']
date, time = start_time.split('T')
event_type = event['event']['type']
location = event['event']['location']
if not location:
location = "N/A"
if (username=="barmx1045" or username=="aposx001" or username=="mcqkl002" or username=="coudx014" or username=="ruscx013" or username=="loumx001" or username=="robkr002" or username=="masgx009"or username=="coxed001" or username=="mcamx009" or username=="linmx024" or username=="woldj002" or username=="fosbl004"):
writer.writerow((first_name, last_name,date, time, event_type, location))