I am attempting to scrape a website using beautifulsoup. I am largely successful but having two issues
After I get the data from website I am printing them to the screen as well as writing them into a CSV file. There is a price field in the website which has a rupee symbol in from of the actual amount (sample structure of the price field :₹ 10000). When I print the amount to console, it prints well and there are no issues. When I try to write it to the excel sheet, I get the error "Unicodeencoeerror" codec 'charmap' cannot encode character '\u20b9' in position 28. I am printing other fields to console and excel the issue shows up only with two fields one with the currency symbol and other with a * symbol
I have a loop running to get all pages from the webpage for a particular search. The search result is about 344 pages but the loop stops at about page 43 with only HTML error 500 as the error message
import bs4 from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as Soup filename = "data.csv" f = open(filename,"w") headers = "phone_name, phone_price, phone_rating,number_of_ratings, memory, display, camera, battery, processor, Warrenty, security, OS\n" f.write(headers) for i in range(2): # Number of pages minus one my_url = 'https://www.flipkart.com/search?as=off&as- show=on&otracker=start&page= {}&q=cell+phones&viewType=list'.format(i+1) print(my_url) uClient=uReq(my_url) page_html=uClient.read() page_soup = Soup(page_html,"html.parser") containers=page_soup.findAll("a", {"class":"_1UoZlX"}) for container in containers: phone_name = container.find("div",{"class":"_3wU53n"}).text try: phone_price = container.find("div",{"class":"_1vC4OE _2rQ-NK"}).text except: phone_price = 'No Data'
Thanks you very much for all you help!