0

so i'm trying to scrap data about motherboard from a local website.

import bs4
import os
import requests

from bs4 import BeautifulSoup as soup

os.chdir('E://')
os.makedirs('E://scrappy', exist_ok=True)
myurl = "https://www.example.com"
res = requests.get(myurl)
page = soup(res.content, 'html.parser')
containers = page.findAll("div", {"class": "content-product"})
filename = 'AM4.csv'
f = open(filename, 'w')
headers = 'Motherboard_Name, Price\n'
f.write(headers)

for container in containers:
    Product = container.findAll("div", {"class": "product-title"})
    Motherboard_Name = Product[0].text.strip()
    Kimat = container.findAll("span", {"class": "price"})
    Price = Kimat[0].text
    print('Motherboard_Name' + Motherboard_Name)
    print('Price' + Price)
    f.write(Motherboard_Name + "," + Price.replace(",", "") + "\n")
f.close() print("done")

But when i run this code i get an error

UnicodeEncodeError:'charmap' codec can't encode character '\u20b9' in position 45: character maps to

how can i fix this??

Edit:: So i fixed the unicode error by adding encoding="utf-8" ( as it was mentioned here python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 9629: character maps to <undefined>) (open(filename, 'w',encoding="utf-8" ))and it seems to do the work however in the csv file m getting characters like ( ₹ ) before the price.. How can i fix this?

screenshot of the csv file

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
user2996348
  • 13
  • 1
  • 5
  • If you add in the start of your script: #!/usr/bin/env python # -*- coding: utf-8 -*- – Costis94 Jun 17 '17 at 10:39
  • In which line are you getting it? – Jeril Jun 17 '17 at 10:41
  • @Costis94 "line 32" File "E:\scrappy\motherboard.py", line 32, in f.write(Motherboard_Name + "," + Price.replace(",","") + "\n") – user2996348 Jun 17 '17 at 10:44
  • @user2996348 for your new question about strange characters, please check [the example solution in this link](https://stackoverflow.com/a/31642070/5638606) – Costis94 Jun 17 '17 at 12:36
  • @Costis94 `# coding: utf8` does nothing but declare the encoding of the *source file*. If the source file contains only ASCII characters, as the OP's code does, it has no effect. – Mark Tolonen Jun 18 '17 at 00:14
  • @user2996348 Per your screen shot, it looks like you are using Excel to view the .csv. Excel on Windows defaults to a localized encoding and not UTF-8. Use `encoding='utf-8-sig'` to write a byte order mark (BOM) at the start of the file and Excel will identify the encoding as UTF-8 and display it correctly. – Mark Tolonen Jun 18 '17 at 00:18
  • @Costis94 That example solution link is for Python 2, which has problems with encodings and its csv module. Python 3 does not have those problems. – Mark Tolonen Jun 18 '17 at 00:19

1 Answers1

1

Use the csv module to manage CSV files, and use utf-8-sig for Excel to recognize UTF-8 properly. Make sure to use newline='' per the csv documentation when opening the file as well.

Example:

import csv

filename = 'AM4.csv'
with open(filename,'w',newline='',encoding='utf-8-sig') as f:
    w = csv.writer(f)
    w.writerow(['Motherboard_Name','Price'])
    name = 'some name'
    price = '\u20b95,99'
    w.writerow([name,price.replace(',','')])

Excel image

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251