Need to read 10 urls at a time to speed up scraping in python. urls stored in csv file

Question

I'm scraping company names from each url that are stored in csv file.

from bs4 import BeautifulSoup 
import requests
import csv

with open("urls.csv", "r") as f_urls, open("results.csv", "w", newline="") as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['url', 'results'])

    for url in f_urls:
        url = url.strip()
        html = requests.get(url).content
        soup = BeautifulSoup(html, "html.parser")           
        Company_Name = soup.find('h1')
        csv_output.writerow([url, Company_Name])

What is your question? Is there something you are having trouble with related to multi-threading? Please be more specific. — Karl, Oct 20 '18 at 20:18
http://idownvotedbecau.se/noattempt/ - Also, the contents of your post should include some explanation of what you are trying to do along with your input data — OneCricketeer, Oct 20 '18 at 20:18
By the way, only one thread/process can write out to the file handle at once, so your code will be blocking on it anyway — OneCricketeer, Oct 20 '18 at 20:19

score 0 · Answer 1 · answered Oct 20 '18 at 20:30

Well html = requests.get(url).content is something that is the 'heavy' part of your code. It downloads the website. In order to speed your code up you want to download multiple websites simultaneous.

Look into asyncio or this post: https://stackoverflow.com/a/40392029/47351

Need to read 10 urls at a time to speed up scraping in python. urls stored in csv file

1 Answers1