12

I'm trying to use the csv module to read a utf-8 csv file, and I have some trouble to create a generic code for python 2 and 3 due to encoding.

Here is the original code in Python 2.7:

with open(filename, 'rb') as csvfile:
    csv_reader = csv.reader(csvfile, quotechar='\"')
    langs = next(csv_reader)[1:]
    for row in csv_reader:
        pass

But when I run it with python 3, it doesn't like the fact that I open the file without "encoding". I tried this:

with codecs.open(filename, 'r', encoding='utf-8') as csvfile:
    csv_reader = csv.reader(csvfile, quotechar='\"')
    langs = next(csv_reader)[1:]
    for row in csv_reader:
        pass

Now python 2 can't decode the line in the "for" loop. So... how should I do it ?

Lennart Regebro
  • 167,292
  • 41
  • 224
  • 251
Syl
  • 2,733
  • 2
  • 17
  • 20
  • 3
    So you want code that runs unchanged both on Python 2.7 and 3? Probably impossible, given that so much has changed with string handling etc. – Tim Pietzcker Mar 03 '11 at 12:20
  • is it possible to specify block code for python 2 or 3 ? – Syl Mar 03 '11 at 12:22
  • 2
    You could check `sys.version` and wrap an `if - else` statement around your code, yes. – Tim Pietzcker Mar 03 '11 at 12:31
  • @Tim Pietzchker; its better to ask forgiveness than permission. – Jakob Bowyer Mar 03 '11 at 12:41
  • I think you had the b flag in the wrong example, I switched it around. – Lennart Regebro Mar 03 '11 at 13:07
  • @JakobBowyer EAFP works only in named functions, not in generator expressions. This is intentional, which I can tell because [PEP 463](https://www.python.org/dev/peps/pep-0463/) for inline catching was rejected. – Damian Yerrick May 01 '17 at 16:39
  • While the "official" recommendation is to do CSVs differently in Python 2 and Python 3, there is a [cleaner, more elegant way](https://stackoverflow.com/a/39379062/95852) listed as an answer to a [similar, if not duplicate, question](https://stackoverflow.com/questions/38808284/portable-way-to-write-csv-file-in-python-2-or-python-3). – John Y Jan 11 '18 at 14:31

3 Answers3

17

Indeed, in Python 2 the file should be opened in binary mode, but in Python 3 in text mode. Also in Python 3 newline='' should be specified (which you forgot).

You'll have to do the file opening in an if-block.

import sys

if sys.version_info[0] < 3: 
    infile = open(filename, 'rb')
else:
    infile = open(filename, 'r', newline='', encoding='utf8')


with infile as csvfile:
    ...
Quentin Pradet
  • 4,691
  • 2
  • 29
  • 41
Lennart Regebro
  • 167,292
  • 41
  • 224
  • 251
  • Can you do `with` on a file handle? – Tim Pietzcker Mar 03 '11 at 13:09
  • 1
    @Tim: It's not a file handle, it's a file object, and you can do `with` on file objects. That's exactly what you do when you do `with open(...`. – Lennart Regebro Mar 03 '11 at 13:20
  • 3
    Makes sense. You never really see it that way, it's always `with open(...)` in the docs, but this way isn't half bad - enables you to wrap the `open()` in a `try` block and catch `File not found` etc. before handing it to the `with` block. – Tim Pietzcker Mar 03 '11 at 13:41
  • 1
    Or, in some situations, `if sys.version < '3': open = codecs.open`. – agf May 13 '12 at 05:25
  • 2
    @agf: Yeah, that can work too. codecs.open and Python 3 open are not exactly the same, though so there are subtle traps, but often it will work. In 2.6 and 2.7 you can do `from io import open`, though. – Lennart Regebro May 13 '12 at 07:56
  • The Python 2 CSV reader only works with ASCII, so using 'r' or 'rb' to open might only solves part of the issue. – roskakori Oct 08 '16 at 21:56
  • Is this still the right approach, even with six or 2to3 and other libraries? I just see how old this answer is. – Davos Sep 29 '17 at 06:36
  • As far as I'm aware, yes. – Lennart Regebro Oct 05 '17 at 09:39
2

Update: While the code in my original answer works I meanwhile release a small package at https://pypi.python.org/pypi/csv342 that provides a Python 3 like interface for Python 2. So independent of your Python version you can simply do an

import csv342 as csv
import io
with io.open('some.csv', 'r', encoding='utf-8', newline='') as csv_file:
    for row in csv.reader(csv_file, delimiter='|'):
        print(row)

Original answer: Here's a solution that even with Python 2 actually decodes the text to Unicode strings and consequently works with encodings other than UTF-8.

The code below defines a function csv_rows() that returns the contents of a file as sequence of lists. Example usage:

for row in csv_rows('some.csv', encoding='iso-8859-15', delimiter='|'):
    print(row)

Here are the two variants for csv_rows(): one for Python 3+ and another for Python 2.6+. During runtime it automatically picks the proper variant. UTF8Recoder and UnicodeReader are verbatim copies of the examples in the Python 2.7 library documentation.

import csv
import io
import sys


if sys.version_info[0] >= 3:
    # Python 3 variant.
    def csv_rows(csv_path, encoding, **keywords):
        with io.open(csv_path, 'r', newline='', encoding=encoding) as csv_file:
            for row in csv.reader(csv_file, **keywords):
                yield row

else:
    # Python 2 variant.
    import codecs

    class UTF8Recoder:
        """
        Iterator that reads an encoded stream and reencodes the input to UTF-8
        """
        def __init__(self, f, encoding):
            self.reader = codecs.getreader(encoding)(f)

        def __iter__(self):
            return self

        def next(self):
            return self.reader.next().encode("utf-8")


    class UnicodeReader:
        """
        A CSV reader which will iterate over lines in the CSV file "f",
        which is encoded in the given encoding.
        """

        def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
            f = UTF8Recoder(f, encoding)
            self.reader = csv.reader(f, dialect=dialect, **kwds)

        def next(self):
            row = self.reader.next()
            return [unicode(s, "utf-8") for s in row]

        def __iter__(self):
            return self


    def csv_rows(csv_path, encoding, **kwds):
        with io.open(csv_path, 'rb') as csv_file:
            for row in UnicodeReader(csv_file, encoding=encoding, **kwds):
                yield row
roskakori
  • 3,139
  • 1
  • 30
  • 29
0

Old Question I know, but I was looking on how to do this. Just in case someone comes over this and might find it useful.

This is how i solved mine, thanks Lennart Regebro for the hint. :

if sys.version > '3':
       rd = csv.reader(open(input_file, 'r', newline='',
       encoding='iso8859-1'), delimiter=';', quotechar='"')
else:
       rd = csv.reader(open(input_file, 'rb'), delimiter=';',
       quotechar='"')

then do what you need to do:

for row in rd:
       ......
jscurtu
  • 23
  • 1
  • 4