-1

I'm scraping company names from each url that are stored in csv file.

from bs4 import BeautifulSoup 
import requests
import csv

with open("urls.csv", "r") as f_urls, open("results.csv", "w", newline="") as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['url', 'results'])

    for url in f_urls:
        url = url.strip()
        html = requests.get(url).content
        soup = BeautifulSoup(html, "html.parser")           
        Company_Name = soup.find('h1')
        csv_output.writerow([url, Company_Name])
  • What is your question? Is there something you are having trouble with related to multi-threading? Please be more specific. – Karl Oct 20 '18 at 20:18
  • http://idownvotedbecau.se/noattempt/ - Also, the contents of your post should include some explanation of what you are trying to do along with your input data – OneCricketeer Oct 20 '18 at 20:18
  • By the way, only one thread/process can write out to the file handle at once, so your code will be blocking on it anyway – OneCricketeer Oct 20 '18 at 20:19

1 Answers1

0

Well html = requests.get(url).content is something that is the 'heavy' part of your code. It downloads the website. In order to speed your code up you want to download multiple websites simultaneous.

Look into asyncio or this post: https://stackoverflow.com/a/40392029/47351

RvdK
  • 19,580
  • 4
  • 64
  • 107