Handling UTF filenames in Windows

Question

Given the following files:

E:/Media/Foo/info.nfo
E:/Media/Bar/FXGâ¢.nfo

I can "find" them with the following:

BASE = r'E:/Media/'

for dirpath, _, files in os.walk(BASE):
    for f in fnmatch.filter(files, '*.nfo'):
        nfopath = os.path.join(dirpath, f)
        print(nfopath)

This snippet would then print the above paths.

However, if I make sure that each path created by os.path.join() is indeed a regular file -- for example with something like:

for dirpath, _, files in os.walk(BASE):
    for f in fnmatch.filter(files, '*.nfo'):
        nfopath = os.path.join(dirpath, f)
        print(nfopath)
        assert os.path.isfile(nfopath)   # <------

The assertion fails for the second filename, but not for the first.

I checked the folder in explorer, and the script indeed found a regular file and printed the name and path correctly, so I'm not clear on why the assertion failed.

I've tried specifying the BASE string as a unicode string (ur'E:/Media/') as well as explicitly encoding the nfopath inside the isfile() call (assert os.path.isfile(nfopath.encode('utf-8')).

Neither seemed to work.

Of course, I could keep track of and manually go through and delete the failing files, but I'm interested in how one would handle this correctly.

Thanks in advance.

(Python 2.7, Windows 7)

I can't tell you for sure, but the file APIs were reworked in 3.x to deal with unicode much better in windows. Can you try 3.x? — Max, Jul 15 '14 at 21:15
Try printing `repr(nfopath)` and see if that sheds some light. — Mark Ransom, Jul 17 '14 at 02:37

score 1 · Answer 1 · edited May 23 '17 at 11:57

1

According to this SO question, Windows stores file names as UTF-16 when using the NTFS filesystem. Retry your encoding step with UTF-16.

edited May 23 '17 at 11:57

Community

1
1

answered Jul 16 '14 at 23:09

skrrgwasme

9,358
11
54
84

Handling UTF filenames in Windows

1 Answers1