1

I have setup BeautifulSoup to find a specific class for two webpages.

I would like to know how to write each URL's result to a unique cell in one CSV?

Also is there a limit to the number of URLs I can read as I would like to expand this to about 200 URLs once I get this working.

The class is always the same and I don't need any formatting just the raw HTML in one cell per URL.

Thanks for any ideas.

from bs4 import BeautifulSoup
import requests
urls = ['https://www.ozbargain.com.au/','https://www.ozbargain.com.au/forum']
for u in urls:
    response = requests.get(u)
    data = response.text
    soup = BeautifulSoup(data,'lxml')
    soup.find('div', class_="block")
Gledi
  • 99
  • 9
DTrain
  • 33
  • 5
  • Your question is not clear and missing detailed information. what's the desired output? which class specifically as there's many classes holding the same name with different attributes. – αԋɱҽԃ αмєяιcαη Aug 23 '20 at 12:12

1 Answers1

1

Use pandas to work with tabular data: pd.DataFrame to create a table, and pd.to_csv to save table as csv (might also check out the documentation, append mode for example).

Basically it.

import requests
import pandas as pd
from bs4 import BeautifulSoup


def func(urls):
    for url in urls:
        data = requests.get(url).text
        soup = BeautifulSoup(data,'lxml')
        yield {
            "url": url, "raw_html": soup.find('div', class_="block")
        }


urls = ['https://www.ozbargain.com.au/','https://www.ozbargain.com.au/forum']

data = func(urls)
table = pd.DataFrame(data)
table.to_csv("output.csv", index=False)
help-ukraine-now
  • 3,850
  • 4
  • 19
  • 36