0

My problem is that only the most recent url request is saved. How can I save all the responses? I tried using df.to_csv('complete.csv', 'a') but that creates a jumbled file.

# imports
import requests
from bs4 import BeautifulSoup
import pandas as pd

# main code
with open('list.txt', 'r') as f_in:
    for line in map(str.strip, f_in):
        if not line:
            continue

        response = requests.get(line)
        data = response.text
        soup = BeautifulSoup(data, 'html.parser')

        linecodes = []
        partnos = []

        for tbody in soup.select('tbody[id^="listingcontainer"]'):
            tmp = tbody.find('span', class_='listing-final-manufacturer')
            linecodes.append(tmp.text if tmp else '-')

            tmp = tbody.find('span', class_='listing-final-partnumber as-link-if-js buyers-guide-color')
            partnos.append(tmp.text if tmp else '-')

        # create dataframe
        df = pd.DataFrame(zip(linecodes,partnos), columns=['linecode', 'partno'])

        # save to csv
        df.to_csv('complete.csv')

        print(df)

list.txt

https://www.rockauto.com/en/catalog/ford,2010,f-150,6.2l+v8,1447337,brake+&+wheel+hub,brake+pad,1684
https://www.rockauto.com/en/catalog/ford,2015,f-150,5.0l+v8,3308775,brake+&+wheel+hub,brake+pad,1684
mjbaybay7
  • 99
  • 5
  • Does this answer your question? [Appending pandas dataframes generated in a for loop](https://stackoverflow.com/questions/28669482/appending-pandas-dataframes-generated-in-a-for-loop) – RichieV Sep 04 '20 at 05:05

1 Answers1

1

You are saving the dataframe after each iterations, which is just overwriting the previous save. So you need to append the dataframes after each iterations. after it completes the loop, then save that final dataframe. So something like:

# imports
import requests
from bs4 import BeautifulSoup
import pandas as pd

# main code
with open('list.txt', 'r') as f_in:
    final_df = pd.DataFrame()
    for line in map(str.strip, f_in):
        if not line:
            continue

        response = requests.get(line)
        data = response.text
        soup = BeautifulSoup(data, 'html.parser')

        linecodes = []
        partnos = []

        for tbody in soup.select('tbody[id^="listingcontainer"]'):
            tmp = tbody.find('span', class_='listing-final-manufacturer')
            linecodes.append(tmp.text if tmp else '-')

            tmp = tbody.find('span', class_='listing-final-partnumber as-link-if-js buyers-guide-color')
            partnos.append(tmp.text if tmp else '-')

        # create dataframe
        df = pd.DataFrame(zip(linecodes,partnos), columns=['linecode', 'partno'])
        print(df)
        final_df = final_df.append(df, sort=False).reset_index(drop=True)

    # save to csv
    final_df.to_csv('complete.csv')

    print(final_df)
chitown88
  • 27,527
  • 4
  • 30
  • 59
  • Ok I see how you did this by introducing final_df. How can I print the completion of each data frame? for example, when the first one completes, how can I print `first request completed`? – mjbaybay7 Sep 04 '20 at 14:16
  • 1
    put in the loop – chitown88 Sep 04 '20 at 17:23
  • so like this? `df = pd.DataFrame(zip(linecodes,partnos), columns=['linecode', 'partno']) print(request completed) print(df) final_df = final_df.append(df, sort=False).reset_index(drop=True)` – mjbaybay7 Sep 04 '20 at 20:33