I was somewhat able to write and execute the program using BeautifulSoup. My concept is to capture the details from html source by parsing multiple urls via csv file and save the output as csv.
Programming is executing well, but the csv overwrites the values in 1st row itself.
input File has three urls to parse
I want the output to be stored in 3 different rows.
Below is my code
import csv
import requests
import pandas
from bs4 import BeautifulSoup
with open("input.csv", "r") as f:
reader = csv.reader(f)
for row in reader:
url = row[0]
print (url)
r=requests.get(url)
c=r.content
soup=BeautifulSoup(c, "html.parser")
all=soup.find_all("div", {"class":"biz-country-us"})
for br in soup.find_all("br"):
br.replace_with("\n")
l=[]
for item in all:
d={}
name=item.find("h1",{"class":"biz-page-title embossed-text-white shortenough"})
d["name"]=name.text.replace(" ","").replace("\n","")
claim=item.find("div", {"class":"u-nowrap claim-status_teaser js-claim-status-hover"})
d["claim"]=claim.text.replace(" ","").replace("\n","")
reviews=item.find("span", {"class":"review-count rating-qualifier"})
d["reviews"]=reviews.text.replace(" ","").replace("\n","")
l.append(d)
df=pandas.DataFrame(l)
df.to_csv("output.csv")
Please kindly let me know if Im not clear on explaining anything.