Trying to extract data from a table and there are foreign characters keeping me from writing to a csv file

Question

i'm extracting data but some special characters will causing an error

from unicodedata import normalize


import codecs
import csv
import urllib2
import requests
from BeautifulSoup import BeautifulSoup

url = 'https://www.ratebeer.com/top'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html)
table = soup.find('tbody')

list_of_rows = []


for row in table.findAll('tr'):
list_of_cells = []
    for cell in row.findAll('td'):
        text = cell.text
        list_of_cells.append(text)
    list_of_rows.append(list_of_cells)

outfile = open("./top50.csv", "wb")
writer = csv.writer(outfile)
writer.writerows(list_of_rows)

trying to extract a csv to import to excel with 50 top beer, rank,name,style,brewery, rating

You should not open the `outfile` as binary and set an appropriate `encoding` as parameter. — Michael Butscher, May 01 '19 at 01:18
Possible duplicate of [Read and Write CSV files including unicode with Python 2.7](https://stackoverflow.com/questions/17245415/read-and-write-csv-files-including-unicode-with-python-2-7) — snakecharmerb, May 01 '19 at 07:03

score 0 · Answer 1 · answered May 01 '19 at 01:30

This is working, python 3.6, defined parser features="lxml", and encoding encoding='utf-8':

import codecs, csv, urlib, requests
from unicodedata import normalize
from bs4 import BeautifulSoup

url = 'https://www.ratebeer.com/top'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html, features="lxml")
table = soup.find('tbody')

list_of_rows = []

for row in table.findAll('tr'):
    list_of_cells = []
    for cell in row.findAll('td'):
        text = cell.text
        list_of_cells.append(text)
    list_of_rows.append(list_of_cells)

outfile = open("./top50.csv", "w", encoding='utf-8')
writer = csv.writer(outfile)
writer.writerows(list_of_rows)

QHarr · Answer 2 · 2019-05-01T08:50:35.770

Consider using pandas? You can specify encoding which handles the characters encoding='utf-8-sig'.

import pandas as pd
import requests
r = requests.get('https://www.ratebeer.com/top', headers = {'User-Agent' : 'Mozilla/5.0'})
table = pd.read_html(r.text)[0]
table.drop(['Unnamed: 5'], axis=1, inplace = True)
table.columns = ['Rank', 'Name', 'Count', 'Abv', 'Score']
table.to_csv(r"C:\Users\User\Desktop\Data.csv", sep=',', encoding='utf-8-sig',index = False )

Trying to extract data from a table and there are foreign characters keeping me from writing to a csv file

2 Answers2