1

I am working on a program which was written in python under windows. It is reading cvs file. Here is the part of the code:

with open(os.path.abspath(self.currencies_file_path), 'r') as f:
    reader = csv.reader(f)
    #for each row find whether such isocode exists in the table
    for row in reader:   #THis is line 49

And this is the error:

  File "whatever/staticdata.py", line 49, in upload_currencies
    for row in reader:
  File "/usr/lib/python3.4/codecs.py", line 313, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 1307: invalid continuation byte

The csv file is not even encoded with utf-8(I think). Why am I having this kind of issue?

P.S. I dont know anything about encodings.

khajvah
  • 4,889
  • 9
  • 41
  • 63

3 Answers3

2

To check the file encoding, you can use the file command:

$ file utils.py
utils.py: Python script, UTF-8 Unicode text executable

To convert a file, you can use the iconv command:

iconv -f ascii -t utf-8 utils.py -o utils.utf8.py

Options: -f: from-encoding; -t: to-encoding; -o outputfile.

Last but not least, explicitly declare the encoding (at the top right below the shebang):

# -*- coding: utf-8 -*-

So, for a working example, you would have something like:

#/usr/bin/env python
# -*- coding: utf-8 -*-

For a list of encoding supported by iconv, you can type:

iconv -l
Peter
  • 225
  • 3
  • 8
Paco
  • 4,520
  • 3
  • 29
  • 53
  • This is my cvs file: `data/currencies.csv: ISO-8859 text` Why am i having this kind of problems? – khajvah Jan 05 '15 at 17:20
  • Windows and linux use different default encodings. It's usually UTF-8 on Linux. I am pretty sure you don't have an encoding line in your script (the line that starts with `# -*- coding` . If you don't tell python which one you use, I think it uses ascii by default. There might be different fallbacks on linux and windows. But it's a good idea to be explicit and tell it exactly what it is – Paco Jan 05 '15 at 17:23
  • Oh wait a second, you're talking about a CSV file, and not a python file. Still use the iconv command to convert it though. – Paco Jan 05 '15 at 17:25
  • So python is trying utf-8 by default and that is the reason it doesn't work? – khajvah Jan 05 '15 at 17:26
  • It must be using either the `locale` (usually utf-8) or utf-8 by default. But reading a csv file is using the same process as launching a python script, it uses the locale by default – Paco Jan 05 '15 at 17:30
  • Found a "ó" in cvs, that is where it fails. – khajvah Jan 05 '15 at 17:33
  • It did, I will accept your answer but one thing: the flag for output is not `-i` but `-o` – khajvah Jan 05 '15 at 17:44
  • According to the `man`: `--output, -o file Specify output file (instead of stdout).` – Paco Jan 06 '15 at 10:11
2

If you think it's latin-1, try this:

import io
with io.open(os.path.abspath(self.currencies_file_path), encoding='latin-1') as f:
    reader = csv.reader(f)
    for row in reader:
user590028
  • 11,364
  • 3
  • 40
  • 57
0

Windows is probably using CP-1252.

There is no way to know, 100% of the time, which encoding a file is using, see this StackOverflow question for reference. If you are using Python3 , just specify the encoding to use when opening the file. If you're using Python 2, you can use io.open to specify an encoding to use.

Community
  • 1
  • 1