python multithreading using Beautifulsoup

Question

This is the function to read Url link and convert into Beautifulsoup

multithreadding=[]
    def scraper_worker(url):
        r=requests.get(url)
        soup = BeautifulSoup(r.text,"html.parser")
        data=soup.find("div",{"class":"main-container"})
        multithreadding.append(data) 

threadding=[]
 for u in split_link:
     t=Thread(target=scraper_worker,args=(u, ))
     t.start()
     threadding.append(t)

split_link is the list where 50 odd links are stored.i am facing problem running the multithreadding part

multithreadding=[] def scraper_worker(url): r=requests.get(url) soup = BeautifulSoup(r.text,"html.parser") data=soup.find("div",{"class":"main-container"}) multithreadding.append(data) — dublinduke, Dec 28 '17 at 10:03
what problem ? do you get error message ? always put full error message (Traceback) in question (as text, not screenshot). There are other useful informations. — furas, Dec 28 '17 at 10:07
Maybe you should use queue to send result to main thread which will add result to list. You could also use queue to send next url to thread so you could run 10 threads instead of 50. — furas, Dec 28 '17 at 10:10
see also [ThreadPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example) to use less threads at the same time. — furas, Dec 28 '17 at 10:13
could you tell me how to use queue to send result to the main thread which eventually will add result in the list. — dublinduke, Dec 29 '17 at 04:40
I made example with `queue` in `thread` for code with `tkinter`: https://stackoverflow.com/a/48021054/1832058 . But you have to explain in question what problem you have with "multithreadding" . Maybe you need different solution. — furas, Dec 30 '17 at 05:41

score 0 · Answer 1 · answered Dec 30 '17 at 05:57

It is example how to use queue to send results from thread to main thread.

import requests
from bs4 import BeautifulSoup
from threading import Thread
import queue

# --- functions ---

def worker(url, queue): # get queue as argument
    r = requests.get(url)

    soup = BeautifulSoup(r.text, "html.parser")
    data = soup.find("span", {"class": "text"}).get_text()

    # send result to main thread using queue
    queue.put(data)

# --- main ---

all_links = [
    'http://quotes.toscrape.com/page/' + str(i) for i in range(1, 11)
]

all_threads = []
all_results = []
my_queue = queue.Queue()

# run threads
for url in all_links:
    t = Thread(target=worker, args=(url, my_queue))
    t.start()
    all_threads.append(t)

# get results from queue    
while len(all_results) < len(all_links):
    # get result from queue
    data = my_queue.get()
    all_results.append(data)

    # or with queue.empty if loop has to do something more
    # because queue.get() wait for data if queue is empty and blocks loop

    #if not my_queue.empty():
    #    data = my_queue.get()
    #    all_results.append(data)

# display results        
for item in all_results:        
    print(item[:50], '...')

python multithreading using Beautifulsoup

1 Answers1