I know there are already hundreds of Python Unicode questions on Stack Overflow. I've read lots of them, but I can't find an answer to mine...
I'm trying to read a latin-1 CSV file. It includes a UK pound sign (character \xa3 in latin-1), so I set encoding="latin-1"
-- but Python appears to ignore the encoding. This:
with open(filename, newline='', encoding="latin-1") as csvfile:
data = csv.reader(csvfile, delimiter=',', quotechar='\"')
for row in data:
print(row)
Produces:
UnicodeEncodeError: 'ascii' codec can't encode character '\xa3' in position 202: ordinal not in range(128)
I've cut down the original CSV file to a single line that triggers the problem. It's the £ sign that causes it.
The only solutions I've found are to use errors="ignore"
-- which is just hiding the problem, or errors="surrogateescape"
-- which is just creating a problem with escaped characters further down the line.
I know that the file encoding is latin-1, although I have also tried utf-8 and iso-8859-1.
Python can happily print a £ sign:
>>> print('£')
> £
>>> print(u'\xa3')
£
Any answers/advice/suggestions would be welcome. Thanks in advance.
=== UPDATE ===
This doesn't produce the error:
with open(file, newline='', encoding="latin-1") as csvfile:
data = csv.reader(csvfile, delimiter=',', quotechar='\"')
for row in data:
print("do nothing with the data")