12

I don't seem to be able to open a file which has a unicode filename. Lets say I do:

for i in os.listdir():
    open(i, 'r')

When I try to search for some solution, I always get pages about how to read and write a unicode string to a file, not how to open a file with file() or open() which has a unicode name.

tshepang
  • 12,111
  • 21
  • 91
  • 136

2 Answers2

28

Simply pass open() a unicode string for the file name:

In Python 2.x:

>>> open(u'someUnicodeFilenameλ')
<open file u'someUnicodeFilename\u03bb', mode 'r' at 0x7f1b97e70780>

In Python 3.x, all strings are Unicode, so there is literally nothing to it.

As always, note that the best way to open a file is always using the with statement in conjunction with open().

Edit: With regards to os.listdir() the advice again varies, under Python 2.x, you have to be careful:

os.listdir(), which returns filenames, raises an issue: should it return the Unicode version of filenames, or should it return 8-bit strings containing the encoded versions? os.listdir() will do both, depending on whether you provided the directory path as an 8-bit string or a Unicode string. If you pass a Unicode string as the path, filenames will be decoded using the filesystem’s encoding and a list of Unicode strings will be returned, while passing an 8-bit path will return the 8-bit versions of the filenames.

Source

So in short, if you want Unicode out, put Unicode in:

>>> os.listdir(".")
['someUnicodeFilename\xce\xbb', 'old', 'Dropbox', 'gdrb']
>>> os.listdir(u".")
[u'someUnicodeFilename\u03bb', u'old', u'Dropbox', u'gdrb']

Note that the file will still open either way - it won't be represented well within Python as it'll be an 8-bit string, but it'll still work.

open('someUnicodeFilename\xce\xbb')
<open file 'someUnicodeFilenameλ', mode 'r' at 0x7f1b97e70660>

Under 3.x, as always, it's always Unicode.

Gareth Latty
  • 86,389
  • 17
  • 178
  • 183
  • 1
    What if I use something like `os.listdir()`, not creating the unicode strings myself? –  Apr 16 '12 at 20:05
  • I would imagine that if you have a Unicode supporting filesystem, os.listdir() will return unicode strings. – Gareth Latty Apr 16 '12 at 20:08
  • 1
    @user975135 Edited to add a note about Python 2.x and ``os.listdir()`` with Unicode. – Gareth Latty Apr 16 '12 at 20:10
  • Hm, Ok. I know this is off, but is any other encoding possible? –  Apr 16 '12 at 20:16
  • @user975135 *The sys.getfilesystemencoding() function returns the encoding to use on your current system, in case you want to do the encoding manually, but there’s not much reason to bother. When opening a file for reading or writing, you can usually just provide the Unicode string as the filename, and it will be automatically converted to the right encoding for you.* [Source](http://docs.python.org/howto/unicode.html#unicode-filenames) – Gareth Latty Apr 16 '12 at 20:22
  • 3
    Note that the filesystemencoding is never a UTF under Windows, so any character that's not in your locale code page will cause the non-Unicode version to fail. – bobince Apr 16 '12 at 20:58
8

You can try this:

import os
import sys

for filename in os.listdir(u"/your-direcory-path/"):
  open(filename.encode(sys.getfilesystemencoding()), "r")
Thanasis Petsas
  • 4,378
  • 5
  • 31
  • 57