0

I'm writing a script that needs to look up some a value in a CSV file. The CSV file is from an external resource and I don't have much control about the file. So basically, I need to work with the CSV file that is presented to me and I need to deal with that.

Now when I read the CSV file (containing 30.000 rows at the moment), the script crashes at a certain point. It returns this error:

File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
  return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0161' in position 55: character maps to <undefined>

My python code for reading the CSV file:

import csv
with open('mybigfile.csv') as f:
    reader = csv.reader(f, delimiter=';')
        for row in reader:
            print(row)

How would I be able to fix this so it can handle the encoding of the file.

Timo002
  • 3,138
  • 4
  • 40
  • 65
  • Your problem is with **printing**, not reading. This is why you need to include the full traceback; it'd have shown you that the `print(row)` line is the problem here, not the `for row in reader:` line. – Martijn Pieters Nov 10 '14 at 15:49
  • @MartijnPieters, oke. but When I don't print anything and do a compare of a column from the CSV file with an internal variable I'm also getting decoding errors (not encoding) – Timo002 Nov 10 '14 at 15:52
  • You are then stuck with having to know the codec used. It doesn't matter if the file is a CSV or anything else; you'll either need to specify the correct codec or use the `errors` argument when opening the file to specifically ignore decoding errors. – Martijn Pieters Nov 10 '14 at 15:53
  • There are Python codec detection libraries (such as `chardet`) but those are not a panacea either. Nothing beats knowing the actual codec used. – Martijn Pieters Nov 10 '14 at 15:54
  • I know in PHP it is possible to detect the encoding, is that possible in Python to? – Timo002 Nov 10 '14 at 15:55
  • PHP can make an educated **guess** at the encoding. `chardet` can guess too. – Martijn Pieters Nov 10 '14 at 15:57

0 Answers0