0

I was somewhat able to write and execute the program using BeautifulSoup. My concept is to capture the details from html source by parsing multiple urls via csv file and save the output as csv.

Programming is executing well, but the csv overwrites the values in 1st row itself.

input File has three urls to parse

I want the output to be stored in 3 different rows.

Below is my code

import csv
import requests
import pandas
from bs4 import BeautifulSoup

with open("input.csv", "r") as f:
    reader = csv.reader(f)

    for row in reader:
        url = row[0]

        print (url)
        r=requests.get(url)

        c=r.content

        soup=BeautifulSoup(c, "html.parser")

        all=soup.find_all("div", {"class":"biz-country-us"})

        for br in soup.find_all("br"):
            br.replace_with("\n")

l=[]
for item in all:
    d={}

    name=item.find("h1",{"class":"biz-page-title embossed-text-white shortenough"})
    d["name"]=name.text.replace("  ","").replace("\n","")

    claim=item.find("div", {"class":"u-nowrap claim-status_teaser js-claim-status-hover"})
    d["claim"]=claim.text.replace("  ","").replace("\n","")

    reviews=item.find("span", {"class":"review-count rating-qualifier"})
    d["reviews"]=reviews.text.replace("  ","").replace("\n","")



    l.append(d)

df=pandas.DataFrame(l)
df.to_csv("output.csv")

Please kindly let me know if Im not clear on explaining anything.

  • How about opening the file in append mode like here: https://stackoverflow.com/questions/17530542/how-to-add-pandas-data-to-an-existing-csv-file ? – enys Sep 28 '18 at 10:53
  • A. i think you should use pd.read_csv and fetch urls and store in a list and traverse the list...B i think the part where you are parsing the html should be inside the for loop where you are reading the url – iamklaus Sep 28 '18 at 10:58

1 Answers1

0

Open the output file in append mode as suggested in this post with the modification that you add header the first time:

from os.path import isfile

if not isfile("output.csv", "w"):
    df.to_csv("output.csv", header=True)
else:
    with open("output.csv", "a") as f:
        df.to_csv(f, header=False)
enys
  • 513
  • 3
  • 11