3

This issue may be faced by many of us but I am poor in unicode handling. Here is the issue: this is a code snippet, I am trying to execute the .exe file and checking whether the file path exists or not but no luck :

#Python 2.6.7

filePath = 'C:\\Test\\'  # Test folder haveing file BitComet_比特彗星_1_25.exe

for (adir, dirs, files) in os.walk(rootdir):
    for f in files:
        path = os.path.join(adir,f)
        if os.path.exists(path ):
            print'Path Found',path 
            #Extract file
            #logging(path )
        else:
            print 'Path Not Found'  
            #logging(path )

I am always getting the result 'Path Not Found'. I tried to use path.decode('utf-8'):
But the script read the file path as:

C:\Test\BitComet_????_1_25.exe    

And since this file path doesn't exist, it goes to the else branch.

Please give me a hint to handle this unicode issue and whether its better if I am able to show user to show the file path on cmd or in log file.

I apologize if this seems to be a duplicate post.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Shashi
  • 2,137
  • 3
  • 22
  • 37

1 Answers1

4

Windows paths are encoding in UTF-16. Python can handle this for you, simply pass a unicode path to os.walk() and you'll get Unicode results instead:

filePath = u'C:\\Test\\'  # Test folder haveing file BitComet_比特彗星_1_25.exe

for (adir, dirs, files) in os.walk(filePath):
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Awesome it works one more small question when i am trying to print the path it give the error UnicodeEncodeError: 'charmap' codec can't encode characters in position 55-58: character maps to how to handle it ? – Shashi Apr 05 '13 at 12:13
  • 2
    That is because the Windows console is *terrible* at printing unicode. See [Python, Unicode, and the Windows console](http://stackoverflow.com/q/5419) – Martijn Pieters Apr 05 '13 at 12:14