This seems like it should be an easy fix, but so far a solution has eluded me. I have a single column csv file with non-ascii chars saved in utf-8 that I want to read in and store in a list. I'm attempting to follow the principle of the "Unicode Sandwich" and decode upon reading the file in:
import codecs
import csv
with codecs.open('utf8file.csv', 'rU', encoding='utf-8') as file:
input_file = csv.reader(file, delimiter=",", quotechar='|')
list = []
for row in input_file:
list.extend(row)
This produces the dread 'codec can't encode characters in position, ordinal not in range(128)' error.
I've also tried adapting a solution from this answer, which returns a similar error
def unicode_csv_reader(utf8_data, dialect=csv.excel, **kwargs):
csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
for row in csv_reader:
yield [unicode(cell, 'utf-8') for cell in row]
filename = 'inputs\encode.csv'
reader = unicode_csv_reader(open(filename))
target_list = []
for field1 in reader:
target_list.extend(field1)
A very similar solution adapted from the docs returns the same error.
def unicode_csv_reader(utf8_data, dialect=csv.excel):
csv_reader = csv.reader(utf_8_encoder(utf8_data), dialect)
for row in csv_reader:
yield [unicode(cell, 'utf-8') for cell in row]
def utf_8_encoder(unicode_csv_data):
for line in unicode_csv_data:
yield line.encode('utf-8')
filename = 'inputs\encode.csv'
reader = unicode_csv_reader(open(filename))
target_list = []
for field1 in reader:
target_list.extend(field1)
Clearly I'm missing something. Most of the questions that I've seen regarding this problem seem to predate Python 2.7, so an update here might be useful.