Handling ascii char in python string

Question

i have file having name "SSE-Künden, SSE-Händler.pdf" which having those two unicode char ( ü,ä) when i am printing this file name on python interpreter the unicode values are getting converted into respective ascii value i guess 'SSE-K\x81nden, SSE-H\x84ndler.pdf' but i want to

test dir contains the pdf file of name 'SSE-Künden, SSE-Händler.pdf'

i tried this: path = 'C:\test' for a,b,c in os.walk(path): print c

['SSE-K\x81nden, SSE-H\x84ndler.pdf']

how do i convert this ascii chars to its respective unicode vals and i want to show the original name("SSE-Künden, SSE-Händler.pdf") on interpreter and also writeing into some file as it is.how do i achive this. I am using Python 2.6 and windows OS.

Thanks.

If you're using Ubuntu, Terminal (from the menu) --> Set Character Encoding — user183037, Sep 22 '11 at 07:00

Mark Tolonen · Accepted Answer · 2011-09-22T07:06:54.130

3

Assuming your terminal supports displaying the characters, iterate over the list of files and print them individually (or use Python 3, which displays Unicode in lists):

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> for p,d,f in os.walk(u'.'):
...  for n in f:
...   print n
...
SSE-Künden, SSE-Händler.pdf

Also note I used a Unicode string (u'.') for the path. This instructs os.walk to return Unicode strings as opposed to byte strings. When dealing with non-ASCII filenames this is a good idea.

In Python 3 strings are Unicode by default and non-ASCII characters are displayed to the user instead of displayed as escape codes:

Python 3.2.1 (default, Jul 10 2011, 21:51:15) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> for p,d,f in os.walk('.'):
...  print(f)
...
['SSE-Künden, SSE-Händler.pdf']

edited Sep 22 '11 at 07:06

answered Sep 22 '11 at 06:54

Mark Tolonen

166,664
26
169
251

sorry i didnt mention before i am using python 2.6 and windows os, ipython – Shashi Sep 22 '11 at 06:56
His question is how to display the unicode characters in their native form (non-byte format) – user183037 Sep 22 '11 at 06:56
+1 Using a unicode path does indeed work, interesting and non-obvious. – six8 Sep 22 '11 at 07:04
no i tried on python 2.6.7 i am getting following error:UnicodeEncodeError: 'charmap' codec can't encode character u'\x81' in position 22: character maps to – Shashi Sep 22 '11 at 07:17
@Shashi, interesting. Your filename is a Unicode string but contains the cp437 (US Windows console encoding) character value for ü. Was this file originally created on Windows? I created the file for the example above and the Unicode characters for ü and ä are `\xfc` and `\xe4`. – Mark Tolonen Sep 22 '11 at 07:36
No i am not sure..whether is is crated on windows or not.OK for my side Unicode characters for ü and ä are \x81 and \x84 ... – Shashi Sep 22 '11 at 07:38
@Shashi: Does the filename display correctly in Windows Explorer? – Mark Tolonen Sep 22 '11 at 07:43
I tried this filename.encode("mbcs") then it skips the ascii ('\x81','\x84') values .means it shows the string like "SSE-Knden, SSE-Hndler.pdf" its changing his meaning. – Shashi Sep 22 '11 at 07:45
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/3686/discussion-between-mark-tolonen-and-shashi) – Mark Tolonen Sep 22 '11 at 07:48

score 1 · Answer 2 · answered Sep 22 '11 at 06:58

1

for a,b,c in os.walk(path):
    for n in c:
        print n.decode('utf-8')

answered Sep 22 '11 at 06:58

six8

2,886
21
20

+1: This should work if his terminal session is set to display unicode. – user183037 Sep 22 '11 at 07:02
2

To set the windows terminal to unicode see http://stackoverflow.com/questions/5419/python-unicode-and-the-windows-console – six8 Sep 22 '11 at 07:15
This won't work if the file system doesn't use UTF-8, such as Windows. – Mark Tolonen Sep 22 '11 at 07:19

score 0 · Answer 3 · answered Sep 22 '11 at 06:54

0

For writing to a file: http://docs.python.org/howto/unicode.html#reading-and-writing-unicode-data

answered Sep 22 '11 at 06:54

user183037

2,549
4
31
42

Handling ascii char in python string

3 Answers3

Linked