Python encoding issue (possibly from windows to linux issue)

Question

I am working on a program which was written in python under windows. It is reading cvs file. Here is the part of the code:

with open(os.path.abspath(self.currencies_file_path), 'r') as f:
    reader = csv.reader(f)
    #for each row find whether such isocode exists in the table
    for row in reader:   #THis is line 49

And this is the error:

  File "whatever/staticdata.py", line 49, in upload_currencies
    for row in reader:
  File "/usr/lib/python3.4/codecs.py", line 313, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 1307: invalid continuation byte

The csv file is not even encoded with utf-8(I think). Why am I having this kind of issue?

P.S. I dont know anything about encodings.

Do you know the encoding file was written with? – user590028 Jan 05 '15 at 17:09 — user590028, Jan 05 '15 at 17:09
@user590028 I dont know, is there a way to determine it ? – khajvah Jan 05 '15 at 17:12 — khajvah, Jan 05 '15 at 17:12

score 2 · Accepted Answer · edited Jun 06 '20 at 05:16

2

To check the file encoding, you can use the file command:

$ file utils.py
utils.py: Python script, UTF-8 Unicode text executable

To convert a file, you can use the iconv command:

iconv -f ascii -t utf-8 utils.py -o utils.utf8.py

Options: -f: from-encoding; -t: to-encoding; -o outputfile.

Last but not least, explicitly declare the encoding (at the top right below the shebang):

# -*- coding: utf-8 -*-

So, for a working example, you would have something like:

#/usr/bin/env python
# -*- coding: utf-8 -*-

For a list of encoding supported by iconv, you can type:

iconv -l

edited Jun 06 '20 at 05:16

Peter

225
3
8

answered Jan 05 '15 at 17:17

Paco

4,520
3
29
53

This is my cvs file: `data/currencies.csv: ISO-8859 text` Why am i having this kind of problems? – khajvah Jan 05 '15 at 17:20
Windows and linux use different default encodings. It's usually UTF-8 on Linux. I am pretty sure you don't have an encoding line in your script (the line that starts with `# -*- coding` . If you don't tell python which one you use, I think it uses ascii by default. There might be different fallbacks on linux and windows. But it's a good idea to be explicit and tell it exactly what it is – Paco Jan 05 '15 at 17:23
Oh wait a second, you're talking about a CSV file, and not a python file. Still use the iconv command to convert it though. – Paco Jan 05 '15 at 17:25
So python is trying utf-8 by default and that is the reason it doesn't work? – khajvah Jan 05 '15 at 17:26
It must be using either the `locale` (usually utf-8) or utf-8 by default. But reading a csv file is using the same process as launching a python script, it uses the locale by default – Paco Jan 05 '15 at 17:30
Found a "ó" in cvs, that is where it fails. – khajvah Jan 05 '15 at 17:33
It did, I will accept your answer but one thing: the flag for output is not `-i` but `-o` – khajvah Jan 05 '15 at 17:44
According to the `man`: `--output, -o file Specify output file (instead of stdout).` – Paco Jan 06 '15 at 10:11

score 2 · Answer 2 · answered Jan 05 '15 at 17:31

2

If you think it's latin-1, try this:

import io
with io.open(os.path.abspath(self.currencies_file_path), encoding='latin-1') as f:
    reader = csv.reader(f)
    for row in reader:

answered Jan 05 '15 at 17:31

user590028

11,364
3
40
57

score 0 · Answer 3 · edited May 23 '17 at 10:09

0

Windows is probably using CP-1252.

There is no way to know, 100% of the time, which encoding a file is using, see this StackOverflow question for reference. If you are using Python3 , just specify the encoding to use when opening the file. If you're using Python 2, you can use io.open to specify an encoding to use.

edited May 23 '17 at 10:09

Community

1
1

answered Jan 05 '15 at 17:35

Python encoding issue (possibly from windows to linux issue)

3 Answers3