My objective is to make a call to an API for each row in a Pandas DataFrame, which contains a List of strings in the response JSON, and creating a new DataFrame with one row per response. My code basically looks like this:
i = 0
new_df = pandas.DataFrame(columns = ['a','b','c','d'])
for index,row in df.iterrows():
url = 'http://myAPI/'
d = '{"SomeJSONData:"' + row['data'] + '}'
j = json.loads(d)
response = requests.post(url,json = j)
data = response.json()
for new_data in data['c']:
new_df.loc[i] = [row['a'],row['b'],row['c'],new_data]
i += 1
This works fine, but I'm making about 5500 API calls and writing about 6500 rows to the new DataFrame so it takes a while, maybe 10 minutes. I was wondering if anyone knew of a way to speed this up? I'm not too familiar with running parallel for loops in Python, could this be done while maintaining thread safety?