I'm trying to speed up API requests using multi threading. I don't understand why but I often get the same API response for different calls (they should not have the same response). At the end I get a lot of duplicates in my new file and a lot of rows are missing.
example : request.post("id=5555") --> response for the request.post("id=444") instead of request.post("id=5555")
It looks like workers catch the wrong responses. Have anybody faced this issue ?
` def request_data(id, useragent):
- ADD ID to data and useragent to headers -
time.sleep(0.2)
resp = requests.post(
-URL-,
params=params,
headers=headerstemp,
cookies=cookies,
data=datatemp,
)
return resp
df = pd.DataFrame(columns=["ID", "prenom", "nom", "adresse", "tel", "mail", "prem_dispo", "capac_acc", "tarif_haut", "tarif_bas", "presentation", "agenda"])
ids = pd.read_csv('ids.csv')
ids.drop_duplicates(inplace=True)
ids = list(ids['0'].to_numpy())
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
future_to_url = {executor.submit(request_data, id, usera): id for id in ids}
for future in concurrent.futures.as_completed(future_to_url):
ok=False
while(ok==False):
try:
resp = future.result()
ok=True
except Exception as e:
print(e)
df.loc[len(df)] = parse(json.loads(resp))
`
I tried using asyncio, first response from Multiple async requests simultaneously but it returned the request and not the API response...