182

I have a Python codebase, built for Python 3, which uses Python 3 style open() with encoding parameter:

https://github.com/miohtama/vvv/blob/master/vvv/textlineplugin.py#L47

    with open(fname, "rt", encoding="utf-8") as f:

Now I'd like to backport this code to Python 2.x, so that I would have a codebase which works with Python 2 and Python 3.

What's the recommended strategy to work around open() differences and lack of encoding parameter?

Could I have a Python 3 open() style file handler which streams bytestrings, so it would act like Python 2 open()?

TuringTux
  • 559
  • 1
  • 12
  • 26
Mikko Ohtamaa
  • 82,057
  • 50
  • 264
  • 435

6 Answers6

191

1. To get an encoding parameter in Python 2:

If you only need to support Python 2.6 and 2.7 you can use io.open instead of open. io is the new io subsystem for Python 3, and it exists in Python 2,6 ans 2.7 as well. Please be aware that in Python 2.6 (as well as 3.0) it's implemented purely in python and very slow, so if you need speed in reading files, it's not a good option.

If you need speed, and you need to support Python 2.6 or earlier, you can use codecs.open instead. It also has an encoding parameter, and is quite similar to io.open except it handles line-endings differently.

2. To get a Python 3 open() style file handler which streams bytestrings:

open(filename, 'rb')

Note the 'b', meaning 'binary'.

Community
  • 1
  • 1
Lennart Regebro
  • 167,292
  • 41
  • 224
  • 251
  • 12
    The 'b' actually means binary mode, not bytes. See https://docs.python.org/3/library/functions.html#open. – pmdarrow Oct 09 '14 at 18:34
  • 10
    @pmdarrow Same thing in this case, but strictly speaking, yes. – Lennart Regebro Oct 13 '14 at 08:02
  • I ran into the problem that you cannot run regex over a byte stream for option 2 ;) – Jonathan Komar Jun 10 '15 at 15:37
  • 3
    @macmadness86 You need to use a byte regexp expression. – Lennart Regebro Jun 11 '15 at 06:53
  • 4
    A note from the porting howto: "Do not bother with the outdated practice of using codecs.open() as that’s only necessary for keeping compatibility with Python 2.5." https://docs.python.org/3/howto/pyporting.html – Al Sweigart Sep 05 '18 at 20:34
  • I did a very simple benchmark between `io.open()` and `codecs.open()` in Python 2.7 with `utf-8-sign` encoding on a large file and I was very surprised to find that the former performed much better on my machine than the latter. – Darragh Enright Oct 10 '19 at 09:11
  • Yes. `codecs` is only faster on Python 2.6 (and possibly on 3.0). With the `encoding=` parameter in `io.open()` you really don't need codecs.open() any more, afaik. – Lennart Regebro Oct 10 '19 at 11:06
74

I think

from io import open

should do.

mfussenegger
  • 3,931
  • 23
  • 18
  • 8
    I think Lennart's response below is much better as it provides more explanation and the caveat about the io module being slow in 2.x along with the suggestion to use codecs.open. – gps Jun 11 '12 at 19:10
  • 2
    What happens if I use `from io import open` in Python 3? I do not care for performance currently. – matth Jul 06 '16 at 14:17
  • 8
    @matth In python3 open from io is an alias for the built-in open. See https://docs.python.org/3/library/io.html?highlight=io#io.open – mfussenegger Jul 06 '16 at 20:27
31

Here's one way:

with open("filename.txt", "rb") as f:
    contents = f.read().decode("UTF-8")

Here's how to do the same thing when writing:

with open("filename.txt", "wb") as f:
    f.write(contents.encode("UTF-8"))
Flimm
  • 136,138
  • 45
  • 251
  • 267
9

This may do the trick:

import sys
if sys.version_info[0] > 2:
    # py3k
    pass
else:
    # py2
    import codecs
    import warnings
    def open(file, mode='r', buffering=-1, encoding=None,
             errors=None, newline=None, closefd=True, opener=None):
        if newline is not None:
            warnings.warn('newline is not supported in py2')
        if not closefd:
            warnings.warn('closefd is not supported in py2')
        if opener is not None:
            warnings.warn('opener is not supported in py2')
        return codecs.open(filename=file, mode=mode, encoding=encoding,
                    errors=errors, buffering=buffering)

Then you can keep you code in the python3 way.

Note that some APIs like newline, closefd, opener do not work

TylerTemp
  • 970
  • 1
  • 9
  • 9
2

If you are using six, you can try this, by which utilizing the latest Python 3 API and can run in both Python 2/3:

import six

if six.PY2:
    # FileNotFoundError is only available since Python 3.3
    FileNotFoundError = IOError
    from io import open

fname = 'index.rst'
try:
    with open(fname, "rt", encoding="utf-8") as f:
        pass
        # do_something_with_f ...
except FileNotFoundError:
    print('Oops.')

And, Python 2 support abandon is just deleting everything related to six.

YaOzI
  • 16,128
  • 9
  • 76
  • 72
1

Not a general answer, but may be useful for the specific case where you are happy with the default python 2 encoding, but want to specify utf-8 for python 3:

if sys.version_info.major > 2:
    do_open = lambda filename: open(filename, encoding='utf-8')
else:
    do_open = lambda filename: open(filename)

with do_open(filename) as file:
    pass
MarkH
  • 449
  • 4
  • 5