running in parallel requests.post over a pandas data frame

Question

So I am trying to run a defined function that is a requests.post that gets the input from a pandas dataframe and save it to the same dataframe but different column

import requests, json
import pandas as pd
import argparse

def postRequest(input, url):
    '''Post response from url'''
    headers = {'content-type': 'application/json'}
    r = requests.post(url=url, json=json.loads(input), headers=headers)
    response = r.json()
    return response

def payload(text):
    # get proper payload from text
    std_payload = { "auth_key":"key", 
        "org":{ "id":org_id, "name":"org" },  
        "ver":{"id":ver_id, "name":"ver" },  
        "mess":{ "id":80}}
    std_payload['message']['text'] = text
    std_payload = json.dumps(std_payload)
    return std_payload

def find(df):
        ff=pd.DataFrame(columns=['text','expected','word','payload','response'])
        count=0
        for leng in range(0,len(df)):
            search=df.text[leng].split()
            ff.loc[count]=df.iloc[leng]
            ff.loc[count,'word']='orginalphrase'
            count=count+1
            for w in range(0,len(search)):
                if df.text[leng]=="3174":
                    ff.append(df.iloc[leng],ignore_index=True)
                    ff.loc[count,'text']="3174"
                    ff.loc[count,'word']=None
                    ff.loc[count,'expected']='[]'
                    continue
                word=search[:]
                ff.loc[count,'word']=word[w]
                word[w]='z'
                phrase=' '.join(word)
                ff.loc[count,'text']=phrase
                ff.loc[count,'expected']=df.loc[leng,'expected']
                count=count+1
            if df.text[leng]=="3174":
                continue
        return ff

# read in csv of phrases to be tested
df = pd.read_csv(filename,engine='python')
#allows empty cells by setting them to the phrase empty
df=df.fillna("3174")
sf=find(df)
for i in sf.index:
    sf['payload']=payload(sf.text[i])
for index in df.index:
    sf.response[index]=postRequest(df.text[index],url)

From all my tests this operation is running over the dataframe one by one which when my dataframe is large this operation can take a few hours.

Searching online for running things in parallel give me a few methods but I do not understand what the methods are doing, I have seen pooling and threading examples while i can get the examples to work. Such as: Simultaneously run POST in Python Asynchronous Requests with Python requests When I try and apply them with my code, specifically I cannot get any method to work with the postRequest it still goes one by one. Can any one provide assistance in getting the paralleling to work correctly. If more informations is required please let me know.

Thanks

Edit: here is the last thing I was working with

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:

    future_to_url = {executor.submit(postRequest, sf.payload[index],trends_url): index for index in range(10)}
    counts=0
    for future in concurrent.futures.as_completed(future_to_url):
        repo = future_to_url[future]
        data = future.result()
        sf.response[count]=data
        count=count+1

also the dataframe has anywhere between 2000 and 4000 rows so doing it in sequence can take up to 4 hours,

@Fozoro i tried but i cannot get it to work, when is used examples for threading it still runs one at a time. — chaotics, Aug 22 '18 at 17:34

running in parallel requests.post over a pandas data frame

0 Answers0