How can I add each new dataframe to the csv that is created?

Question

My problem is that only the most recent url request is saved. How can I save all the responses? I tried using df.to_csv('complete.csv', 'a') but that creates a jumbled file.

# imports
import requests
from bs4 import BeautifulSoup
import pandas as pd

# main code
with open('list.txt', 'r') as f_in:
    for line in map(str.strip, f_in):
        if not line:
            continue

        response = requests.get(line)
        data = response.text
        soup = BeautifulSoup(data, 'html.parser')

        linecodes = []
        partnos = []

        for tbody in soup.select('tbody[id^="listingcontainer"]'):
            tmp = tbody.find('span', class_='listing-final-manufacturer')
            linecodes.append(tmp.text if tmp else '-')

            tmp = tbody.find('span', class_='listing-final-partnumber as-link-if-js buyers-guide-color')
            partnos.append(tmp.text if tmp else '-')

        # create dataframe
        df = pd.DataFrame(zip(linecodes,partnos), columns=['linecode', 'partno'])

        # save to csv
        df.to_csv('complete.csv')

        print(df)

list.txt

https://www.rockauto.com/en/catalog/ford,2010,f-150,6.2l+v8,1447337,brake+&+wheel+hub,brake+pad,1684
https://www.rockauto.com/en/catalog/ford,2015,f-150,5.0l+v8,3308775,brake+&+wheel+hub,brake+pad,1684

Does this answer your question? [Appending pandas dataframes generated in a for loop](https://stackoverflow.com/questions/28669482/appending-pandas-dataframes-generated-in-a-for-loop) — RichieV, Sep 04 '20 at 05:05

chitown88 · Accepted Answer · 2020-09-04T17:25:04.723

You are saving the dataframe after each iterations, which is just overwriting the previous save. So you need to append the dataframes after each iterations. after it completes the loop, then save that final dataframe. So something like:

# imports
import requests
from bs4 import BeautifulSoup
import pandas as pd

# main code
with open('list.txt', 'r') as f_in:
    final_df = pd.DataFrame()
    for line in map(str.strip, f_in):
        if not line:
            continue

        response = requests.get(line)
        data = response.text
        soup = BeautifulSoup(data, 'html.parser')

        linecodes = []
        partnos = []

        for tbody in soup.select('tbody[id^="listingcontainer"]'):
            tmp = tbody.find('span', class_='listing-final-manufacturer')
            linecodes.append(tmp.text if tmp else '-')

            tmp = tbody.find('span', class_='listing-final-partnumber as-link-if-js buyers-guide-color')
            partnos.append(tmp.text if tmp else '-')

        # create dataframe
        df = pd.DataFrame(zip(linecodes,partnos), columns=['linecode', 'partno'])
        print(df)
        final_df = final_df.append(df, sort=False).reset_index(drop=True)

    # save to csv
    final_df.to_csv('complete.csv')

    print(final_df)

Ok I see how you did this by introducing final_df. How can I print the completion of each data frame? for example, when the first one completes, how can I print `first request completed`? — mjbaybay7, Sep 04 '20 at 14:16
so like this? `df = pd.DataFrame(zip(linecodes,partnos), columns=['linecode', 'partno']) print(request completed) print(df) final_df = final_df.append(df, sort=False).reset_index(drop=True)` — mjbaybay7, Sep 04 '20 at 20:33

How can I add each new dataframe to the csv that is created?

1 Answers1