0

I have code that gets auth code from request to be passed into the payload to get another reponse and is running multiple thread of it to increase speed. Right now I am using threading to increase number of request being made however there's no significant difference with running 1 request and 6 thread request. How should I improve my code further so that the request process can be faster.

#--------------------------------------------------------    
#scraper
def checkaccount(phonelist,path):
   #check if file exist, if not create file   
   try:
      openingfile = pd.read_csv(path)
   except:
      df = pd.DataFrame(columns=['telephone','result1','result2'])
      df.to_csv(path , mode='w', index=False, header=True)
#loops telephone
for tel in phonelist:
       #gets token
       url=xxxxxxxx
       payload={
       'phone':tel }
       token=requests.get(url, payload=payload).text
       payloadtoken={'token':token,
                      'phone':tel}

       #gets result
       result1=requests.get(url, payload=payloadtoken).text 
       
       #clean data and append to csv
       dat={
       'telephone':telephone,
        'result':result1[0]}
       data=pd.DataFrame(dat)
       data.to_csv(path, mode='a', index=False, header=False)
#------------------------------------------------------------


#gets telephone data to be passed into payload
telephone=pd.read_csv('tel.csv') 
length=len(telephone['msisdn'].tolist())

#gets number to break the list into parts to run
q1 = int(length/6)
q2 = int(length/6*2)
q3 = int(length/6*3)
q4 = int(length/6*4)
q5 = int(length/6*5)
def run_threaded(job_func):
    job_thread = threading.Thread(target=job_func)
    job_thread.start()

#thread function to run request at a specific time
schedule.every().day.at('23:58').do(run_threaded,lambda:checkaccount(telephone['msisdn'][0:q1],"data1.csv")) 
schedule.every().day.at('23:58').do(run_threaded,lambda:checkaccount(telephone['msisdn'][q1:q2],"data2.csv")) 
schedule.every().day.at('23:58').do(run_threaded,lambda:checkaccount(telephone['msisdn'][q2:q3],"data3.csv")) 
schedule.every().day.at('23:58').do(run_threaded,lambda:checkaccount(telephone['msisdn'][q3:q4],"data4.csv")) 
schedule.every().day.at('23:58').do(run_threaded,lambda:checkaccount(telephone['msisdn'][q4:q5],"data5.csv")) 
schedule.every().day.at('23:58').do(run_threaded,lambda:checkaccount(telephone['msisdn'][q5:],"data6.csv")) 

while True:
    schedule.run_pending()
    sleep(1)

Is there a way to optimize the above code so that I can request faster or more quickly?

gndps
  • 35
  • 4
  • You can try and look at the aiohttp library in python. (https://docs.aiohttp.org/en/stable/client_quickstart.html) – Signor Sep 24 '22 at 17:26
  • let me try that. thank you for the share. My question is will appending all results into 1 csv be a problem?Should I create an individual csv for each pool? – gndps Sep 24 '22 at 17:27

0 Answers0