I'm paginating through an API and saving the request to a dataframe.
I can collect 100 rows at a time and this loop currently runs for over an hour.
I'm worried it's because once I reach over 100,000 rows it becomes very inefficient to append the next 100.
Here's my current code:
while JSONContent['next'][0:10]>unixtime_yesterday:
try:
url=...
JSONContent = requests.request("GET", url).json()
temp_df=json_normalize(JSONContent['data'])
temp_df=temp_df[['email','datetime','Campaign Name']]
temp_df.drop_duplicates(subset=['email','Campaign Name'], keep='last', inplace=True, ignore_index=True)
df_received=df_received.append(temp_df,ignore_index=True,sort=False)
except ValueError:
print('There was a JSONDecodeError')
To make this as efficient as possible I only keep 3 columns from the whole request. I also drop any duplicates which appear within the 100 rows.