2

I have two distincts files called:

'╠.txt' and '¦.txt'

Such simple code:

files = os.listdir('E:\pub\private\desktop\')
for f in files:
    print f, repr(f), type (f)

which would return

¦.txt '\xa6.txt' <type 'str'>
¦.txt '\xa6.txt' <type 'str'>

I don't get why I am getting the code 0xA6 for the ╠ character instead of OxCC. I have been trying to play arround with the encode-decode methode but without success. I have noticed that sys.getfilesystemencoding() is set mbcs - but I can't manage to change it something like cp437.

Any help is very much appreciated. Thanks!

Makoto
  • 104,088
  • 27
  • 192
  • 230
da_chinese
  • 23
  • 3
  • 1
    OT for your actual question, but when working with paths on Windows, do one of (1) use raw strings `r"E:\whatever"` (2) use forward slashes `"E:/whatever"` (3) double the backslashes `"E:\\whatever"` to avoid another question here when your paths get interpreted wrong because your string literal contains a `\n`, `\t` or another character like that. – bgporter Jun 21 '11 at 14:05
  • Using Python 3, with `sys.getfilesystemencoding()` returning `'mbcs'`, I don't seem to be encountering your problem. @bgporter: Yeah, the way he has it set up it looks like the string is never closed properly. – JAB Jun 21 '11 at 14:17

1 Answers1

4

You have to pass a unicode string to os.listdir and Python will return unicode filenames:

# a string that is unicode+raw (escapes \)
path = ur"E:\pub\private\desktop"
print os.listdir(path)
# [u'\xa6.txt', u'\u2560.txt']

Windows NT actually uses unicode for filenames, but I guess Python tries to encode them when you pass a encoded path name.

Jochen Ritzel
  • 104,512
  • 31
  • 200
  • 194
  • This was indeed the issue- Code as follow now fixes my mojibake :- path = ur"E:\pub\private\desktop\TestMp3" files = os.listdir(path) for f in files: print f, repr(f), type (f) msdos = f.encode('cp437') sjis = msdos.decode('shift-jis') print f, repr(sjis), sjis Tks a lot - – da_chinese Jun 22 '11 at 17:02