I have a list with links, and I want to use 5 threads to scrape all links. I managed to do a thread for each link, but I will have over 1000 links, and I don't want my computer and the website to break.
My question is how to use a fixed number of threads to scrape 100 links for example ?
Here is what I got right now:
def main():
urls = ["http://google.com","http://yahoo.com"]
threads = []
#Starting all the requests
for url in urls:
thread = threading.Thread(target=loadurl, args=(url,))
thread.start()
print "[+] Thread started for:", url
threads.append(thread)
#All requests started
print "[+] Requests done"
for thread in threads:
thread.join()
print "[+] Finished!"
#Just print the source
def loadurl(url):
page = urllib2.urlopen(url);
soup = BeautifulSoup(page);
print soup