0

I have these lines to get filenames in a folder.

# -*- coding: utf8 -*-

import os, sys

reload(sys)  
sys.setdefaultencoding('utf8')

file_folder = "C:\\tem\\"

root, dirs, files = os.walk(file_folder).next()

for path, subdirs, files in os.walk(root):
    for f in files:
        print file_folder + f

It doesn’t work when the folder contains .xlsb files. (Excel Binary Workbook file)

The error returned is:

[Decode error - output not utf-8]

I tried changed the last line to encode/decode to make them work but still can’t.

How can I have them displayed properly?

martineau
  • 119,623
  • 25
  • 170
  • 301
Mark K
  • 8,767
  • 14
  • 58
  • 118
  • 1
    I don't think it's because of the file extension but rather the file name. What are the full path + file name to the files that failed? Or maybe just don't set the system default encoding at all. – r.ook Jan 24 '18 at 02:24
  • @Idlehands, you are right, it's not about the file extension. actually the problem lies in file names. let me delete this question in 10 minutes. anyway if the file names contains non utf-8 characters, the lines shall be working... – Mark K Jan 24 '18 at 02:27
  • 1
    check out the answer here https://stackoverflow.com/questions/2276200/changing-default-encoding-of-python#17628350 Python 3 is the fix. but if you absolutely need python2 then try print(u"{}".format(file_folder + f)) it could help. – Back2Basics Jan 24 '18 at 02:30
  • @Back2Basics, thank you for the link and the line. tried print(u"{}".format(file_folder + f)) and it still cann't. the reason being some file names contains a dash. – Mark K Jan 24 '18 at 02:38
  • @MarkK maybe instead of `setdefaultencoding()` you can `getdefaultencoding()` instead? I'm not sure what's your purpose for that line but `os.walk()` shouldn't require encoding. Have you tried running without that line? Also dashes *shouldn't* cause the error... unless they're some special characters that looks like a dash but isn't. – r.ook Jan 24 '18 at 03:00
  • 1
    I suggest using [`glob`](https://docs.python.org/3/library/glob.html#module-glob) in the standard library. – martineau Jan 24 '18 at 03:01
  • @Idlehands, thank you for the follow-up. i tried getdefaultencoding() and it returns the same warning. further, i don't use os.walk() and use os.listdir to list the files, still the same... – Mark K Jan 24 '18 at 08:38
  • @martineau, thank you. I tried glob but returned the same error.. – Mark K Jan 24 '18 at 08:38

1 Answers1

1

Your output terminal doesn't handle the Unicode characters in some of your filenames. The easiest solution is to write the output to a UTF-8-encoded file and then read the result with a Unicode-capable editor using a font that supports the Unicode characters used. Another solution where you can still use print is to get a UTF-8-capable IDE.

#!python2
import os,io

root = u'C:\\tem'  # Use a Unicode root with os.walk() to get Unicode filenames.

with io.open('files.txt','w',encoding='utf8') as out:
    for path, subdirs, files in os.walk(root):
        for f in files:
            out.write(os.path.join(path,f) + u'\n')

Notes:

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • thank you. i follow the same and it still gives "UnicodeDecodeError: 'ascii' codec can't decode byte 0xa8 in position 68: ordinal not in range(128)". it works fine if the folder doesn't contain the files of names contain the look like dash. – Mark K Jan 25 '18 at 01:27
  • 1
    @MarkK you used the *exact* code above? That particular error only occurs if you mix byte strings and Unicode strings. – Mark Tolonen Jan 25 '18 at 04:19
  • superb! excuse mine carelessness. the u' matters for the root path. if there's no u', it doesn't work. with the u', it works perfectly! – Mark K Jan 25 '18 at 09:39