Attempting to export parsed data to CSV file with Python and I can't figure out how to export more than one row

Question

I'm fairly new to beautiful soup/Python/Web Scraping and I have been able to scrape data from a site, but I am only able to export the very first row to a csv file ( I want to export all scraped data into the file.)

I am stumped on how to make this code export ALL scraped data into multiple individual rows:

r = requests.get("https://www.infoplease.com/primary-sources/government/presidential-speeches/state-union-addresses")
data = r.content  # Content of response
soup = BeautifulSoup(data, "html.parser")


for span in soup.find_all("span", {"class": "article"}):
    for link in span.select("a"):
           
        name_and_date = link.text.split('(')
        name = name_and_date[0].strip()
        date = name_and_date[1].replace(')','').strip()
        
        base_url = "https://www.infoplease.com"
        links = link['href']
        links = urljoin(base_url, links)
        
        
    
    pres_data = {'Name': [name],
                'Date': [date],
                'Link': [links]
                }
        
    df = pd.DataFrame(pres_data, columns= ['Name', 'Date', 'Link'])

    df.to_csv (r'C:\Users\ThinkPad\Documents\data_file.csv', index = False, header=True)

    print (df)

Any ideas here? I believe I need to loop it through the data parsing and grab each set and push it in. Am I going about this the right way?

Thanks for any insight

Does this answer your question? [How to add pandas data to an existing csv file?](https://stackoverflow.com/questions/17530542/how-to-add-pandas-data-to-an-existing-csv-file) — Macattack, Feb 23 '21 at 00:23

score 1 · Accepted Answer · answered Feb 23 '21 at 00:28

The way it is currently set up, it looks like you are not adding each link as a new entry and instead it is only adding the last link. If you initialize a list and add a dictionary like you have it set up for each iteration of the "links" for loop, you will add each row and not just the last one.

import pandas as pd 
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

r = requests.get("https://www.infoplease.com/primary-sources/government/presidential-speeches/state-union-addresses")
data = r.content  # Content of response
soup = BeautifulSoup(data, "html.parser")

pres_data = []
for span in soup.find_all("span", {"class": "article"}):
    for link in span.select("a"):
        
        name_and_date = link.text.split('(')
        name = name_and_date[0].strip()
        date = name_and_date[1].replace(')','').strip()
        
        base_url = "https://www.infoplease.com"
        links = link['href']
        links = urljoin(base_url, links)
        
        this_data = {'Name': name,
                    'Date': date,
                    'Link': links
                    }
        pres_data.append(this_data)
        
df = pd.DataFrame(pres_data, columns= ['Name', 'Date', 'Link'])

df.to_csv (r'C:\Users\ThinkPad\Documents\data_file.csv', index = False, header=True)

print (df)

Thats it! Thanks so much. I was just reading about the append action. Really appreciate this — quizno, Feb 23 '21 at 00:38

score 0 · Answer 2 · answered Feb 23 '21 at 00:43

You don't need to use Pandas here since you are not willing to apply any kind of Data operation there!

Usually try to limit yourself on the builtin libraries in case if the task is shorter.

import requests
from bs4 import BeautifulSoup
import csv


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'lxml')
    target = [([x.a['href']] + x.a.text[:-1].split(' ('))
              for x in soup.select('span.article')]
    with open('data.csv', 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(['Url', 'Name', 'Date'])
        writer.writerows(target)


main('https://www.infoplease.com/primary-sources/government/presidential-speeches/state-union-addresses')

Sample of output:

Attempting to export parsed data to CSV file with Python and I can't figure out how to export more than one row

2 Answers2