0

I have read through similar questions on stack overflow, however non of them solve the unicode problem I have: 'ascii' codec can't decode byte 0xc3 in position 302.

Have tried: import sys reload(sys) sys.setdefaultencoding("utf-8")

however receive an error: NameError: name 'reload' is not defined

I try to read file with danish vowels: æ, ø, å. In return receive 'UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 position 302 etc. Position 302 and further on include danish vowels. Is there a way to fix this?

So far I have tried putting a specially-formatted comment as the first line of the source code: # -*- coding: <ascii> -*-. Did not give any result.

Also tried: f = open(fname, encoding="ascii", errors="surrogate escape"). But instead of reading file with characters as they are for example in the word "Europæiske" I get "Europ\udcc3\udca6iske".

Then I tried suggestions from the blog (lost a link to that blog) to "import unicodedata", however, it was not well explained where to take it form there.

import unicodedata
import csv

with open('File.csv') as f:
  reader = csv.reader(f)
  for row in reader:
    print(row)
Mr Lister
  • 45,515
  • 15
  • 108
  • 150
Nadia S
  • 19
  • 2
  • 10
  • 1
    Possible duplicate of [UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128)](http://stackoverflow.com/questions/24475393/unicodedecodeerror-ascii-codec-cant-decode-byte-0xc3-in-position-23-ordinal) – kchomski Mar 15 '16 at 16:01
  • kchomski, not trying to concatenate but rather read file with danish characters, it is a different case! – Nadia S Mar 15 '16 at 16:07

2 Answers2

4

Simply open with the correct encoding. You have to know the encoding that the file was saved in. Western versions of Windows might be Windows-1252, or perhaps utf8. Modules such as chardet can perform an educated guess. Also, for for csv module, open with newline='' as well (see documentation for using csv.reader:

import csv

with open('File.csv',encoding='utf8',newline='') as f:
  reader = csv.reader(f)
  for row in reader:
    print(row)
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
1

that #-- coding: thing is only for what's being used in the program itself, for example if you define a variable or function with Danish characters.

what you're dealing with is I/O, so remember the rule: bytes on the edges, Unicode inside. this means use str.decode when reading in, and unicode.encode when writing out.

jcomeau_ictx
  • 37,688
  • 6
  • 92
  • 107