-3

Hi all i am trying to pass input from csv file to my url and file contains million records so the response is very slow it takes alot time and sometime get timed out can anyone help me to make it faster i can provide the code but not the data and url as it is confidential. here is my code:

import pandas as pd
import requests
import json
from pandas.io.json import json_normalize
from flatten_json import flatten

 df=pd.read_excel('C:/data1.xlsx',index=None,index_col=None,encoding='UTF-8')
 df.head(5)
 df.shape
 df=df[['NAME','country']]
 df.head(5)

 passwd=b"abc"
 user=b"xxxxx"
 # Make a request to the endpoint using the correct auth values
 auth_values = (user,passwd)
 dd=df.values

 dfj = pd.DataFrame()

 for i,j in dd:

     url='http:xyz.com/&name='+str(i)+'&country='+str(j)    
     resp = requests.get(url,auth=auth_values)
     r=resp.json()

Please modify this code to make it faster

thanks in advance for help

enter code here
  • 1) Your code does the get sequentially. Slow is understandable. Timing out is not really expected. So probably something wrong with the url or other that is giving a timeout 2) Try looking at multithreading? Multithreading pools are very well equipped for this kind of problem. – Jason Chia Jan 27 '20 at 13:12
  • i tried multithreading in my code but it was also not working maybe i was doing something wrong can you pleasemodify this code – Ankit Kumar Sharma Jan 27 '20 at 13:16
  • Lots of sources on how to do multithreading. https://stackoverflow.com/questions/2846653/how-can-i-use-threading-in-python This one can be quickly broken down to do what you need. – Jason Chia Jan 27 '20 at 13:19
  • i tried using threading but i am getting an error :"the truth value of an array with more than one element is ambiguous" as i don't have any function here so i am using dataframe as target as i am passing input from excel file to the url parameter – Ankit Kumar Sharma Jan 28 '20 at 06:24
  • threading is still slow can you tell me what else can i use to make it more faster for million records – Ankit Kumar Sharma Jan 28 '20 at 06:27
  • slow and fast is relative. How many threads are you using? Are you using a pool or? Check your cpu stats. is your issue IO/CPU bound or network bound etc. No idea what that array error is coming from though. – Jason Chia Jan 28 '20 at 08:52
  • As you can see in my code i am passing the data from excel file to url query parameter these fileds contains million rows my issue is that i am getting a very slow response from the respective url didn't use any thread pool i just invoked thread function : t = threading.Thread(target=dd) thread.append(t) and i am very new to this threading concept in python – Ankit Kumar Sharma Jan 28 '20 at 09:35
  • it will be good if you can just modify my code using thread it will be very helpful – Ankit Kumar Sharma Jan 28 '20 at 09:38

1 Answers1

0

Try this for threading.

from concurrent.futures import ThreadPoolExecutor
executors = ThreadPoolExecutor(max_workers = n) # n = int number of threads.
running_threads = []
#To do a job
run_threads = executors.submit(func,foo) #func is your function
running_threads.append(run_threads)
#To wait for your thread to end:
while True: #All inserts are complete
    if all(i.done() == True for i in running_threads):
        print("all done")
        break
    else:
        time.sleep(1)
Jason Chia
  • 1,144
  • 1
  • 5
  • 18