1

my dataframe df contains over 600 urls and i want to get specific value from an Element. this code works fine for that:

    ownerlist = []
for links in tqdm (df['Link'], leave=False, position=0):
    ownersite = s.get(links, cookies=cookies)
    owsoup = BeautifulSoup(ownersite.content, 'lxml')
    owner = owsoup.find('input', {'id': 'GlobalBodyContent_InternalBodyContent_BodyContent_Owner'}).get('value')
    ownerlist.append(owner)
    #print(len(ownerlist),owner)
df['Owner'] = ownerlist
print(df)

but it takes up to 40 minutes to get all requests done. i tried a multithread approach but i am not able to get it to work. it runs faster but insted off 600+ items i have only 2or 3 in my List after. I tried:

owner = []
def mt(links):
    ap = s.get(links, cookies=cookies)
    apsoup = BeautifulSoup(ap.content, 'lxml')
    ap1 = apsoup.find('input', {'id': 'GlobalBodyContent_InternalBodyContent_BodyContent_Owner'}).get('value')
    #print(ap1)
    owner.append(ap1)

def main():

    for links in tqdm(df['Link']):
        threadProcess = threading.Thread(name='simplethread', target=mt, args=[links])
        threadProcess.daemon = True
        threadProcess.start()

main() 

How can i run this loop faster than 40 minutes ? Thanks !

Alex
  • 11
  • 2
  • Can this help you: https://stackoverflow.com/questions/16181121/a-very-simple-multithreading-parallel-url-fetching-without-queue – CyDevos Sep 22 '21 at 19:54
  • My guess is you aren't waiting for the threads to finish. So, when you checked `owner`, only a few of the threads had finished and the rest were still working. You should not be using `daemon`, because you need to `join` all of those to know when they are done. – Tim Roberts Sep 22 '21 at 20:00

0 Answers0