3

Possible Duplicate:
Open a file in the proper encoding automatically

my code:

import csv

def handle_uploaded_file(f):
  dataReader = csv.reader(f, delimiter=';', quotechar='"')

for row in dataReader:
  do_sth

the problem is that it works well only if csv is UTF-8 encoded. What should I change to serve the iso-8859-2 or windows-1250 encoding? (the best solution is to autorecognize the encoding, but hand converting is also acceptable)

Community
  • 1
  • 1
Tomasz Brzezina
  • 1,452
  • 5
  • 21
  • 44

3 Answers3

5

The solution for now:

def reencode(file):
    for line in file:
        yield line.decode('windows-1250').encode('utf-8')

csv_reader = csv.reader(reencode(open(filepath)), delimiter=";",quotechar='"')
Tomasz Brzezina
  • 1,452
  • 5
  • 21
  • 44
  • 2
    this is not he crrect answer , csv documentation : Since open() is used to open a CSV file for reading, the file will by default be decoded into unicode using the system default encoding (see locale.getpreferredencoding()). To decode a file using a different encoding, use the encoding argument of open: –  Dec 15 '17 at 17:15
  • 3
    I was able to open the file with `open(filename, 'r', encoding='latin-1') as f:` and it fixed the encoding errors I was getting. A standard list of encodings can be found here: https://docs.python.org/3/library/codecs.html#standard-encodings – Max Candocia Jan 09 '18 at 16:10
3

Have a look at the examples section of the csv module documentation. At the end, you'll find classes you can use for exactly that purpose, specifying the encoding.

Mattie
  • 20,280
  • 7
  • 36
  • 54
1

Pass a file-descriptor opened with codecs.open. You can't autorecognize encodings, or not very well. To guess the encoding you can use chardet.

dav1d
  • 5,917
  • 1
  • 33
  • 52