3

To summarize: How do I print unicode system independently to produce play card symbols?

What I do wrong, I consider myself quite fluent in Python, except I seem not able to print correctly!

# coding: utf-8
from __future__ import print_function
from __future__ import unicode_literals
import sys

symbols = ('♥','♦','♠','♣')
# red suits to sdterr for IDLE
print(' '.join(symbols[:2]), file=sys.stderr)
print(' '.join(symbols[2:]))

sys.stdout.write(symbols) # also correct in IDLE
print(' '.join(symbols))

Printing to console, which is main consern for console application, is failing miserably though:

J:\test>chcp
Aktiivinen koodisivu: 850


J:\test>symbol2
Traceback (most recent call last):
  File "J:\test\symbol2.py", line 9, in <module>
    print(''.join(symbols))
  File "J:\Python26\lib\encodings\cp850.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-3: character maps to <unde
fined>
J:\test>chcp 437
Aktiivinen koodisivu: 437

J:\test>d:\Python27\python.exe symbol2.py
Traceback (most recent call last):
  File "symbol2.py", line 6, in <module>
    print(' '.join(symbols))
  File "d:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2660' in position 0: character maps
o <undefined>

J:\test>

So summa summarum I have console application which works as long as you are not using console, but IDLE.

I can of course generate the symbols myself by producing them by chr:

# correct symbols for cp850
print(''.join(chr(n) for n in range(3,3+4)))

But this looks very stupid way to do it. And I do not make programs only run on Windows or have many special cases (like conditional compiling). I want readable code.

I do not mind which letters it outputs, as long as it looks correct no matter if it is Nokia phone, Windows or Linux. Unicode should do it but it does not print correctly to Console

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
Tony Veijalainen
  • 5,447
  • 23
  • 31
  • The Windows console prints the card symbols using normally unprintable characters, specifically `print '\x03\x04\x05\x06'`. cp437 (US Windows console) and cp850 themselves don't support those characters. See the note at the end of the table on this page: http://en.wikipedia.org/wiki/Code_page_850 – Mark Tolonen Nov 21 '10 at 06:30
  • This I know, See the code at end of my reviced post. "The C0 control range (0x00–0x1F hex) is mapped to graphics characters. The codes can assume their original function as controls (as they still do—typing "echo", space, control-G and then Enter causes the PC speaker to emit a beep—even on the command prompt on Windows XP), but in display, for example in a screen editor like MS-DOS edit, they show as graphics. The graphics are various, such as smiling faces, card suits and musical notes." They are there so why they do not print as unicode? – Tony Veijalainen Nov 21 '10 at 06:47

4 Answers4

2

Whenever I need to output utf-8 characters, I use the following approach:

import codecs

out = codecs.getwriter('utf-8')(sys.stdout)

str = u'♠'

out.write("%s\n" % str)

This saves me an encode('utf-8') every time something needs to be sent to sdtout/stderr.

Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130
1

In response to the updated question

Since all you want to do is to print out UTF-8 characters on the CMD, you're out of luck, CMD does not support UTF-8:
Is there a Windows command shell that will display Unicode characters?

Old Answer

It's not totally clear what you're trying to do here, my best bet is that you want to write the encoded UTF-8 to a file.

Your problems are:

  1. symbols = ('♠','♥', '♦','♣') while your file encoding maybe UTF-8, unless you're using Python 3 your strings wont be UTF-8 by default, you need to prefix them with a small u:
    symbols = (u'♠', u'♥', u'♦', u'♣')

  2. Your str(arg) converts the unicode string back into a normal one, just leave it out or use unicode(arg) to convert to a unicode string

  3. The naming of .decode() may be confusing, this decodes bytes into UTF-8, but what you need to do is to encode UTF-8 into bytes so use .encode()

  4. You're not writing to the file in binary mode, instead of open('test.txt', 'w') your need to use open('test.txt', 'wb') (notice the wb) this will open the file in binary mode which is important on windows

If we put all of this together we get:

# -*- coding: utf-8 -*-
from __future__ import print_function
import sys

symbols = (u'♠',u'♥', u'♦',u'♣')

print(' '.join(symbols))
print('Failure!')

def print(*args,**kwargs):
    end = kwargs[end] if 'end' in kwargs else '\n'
    sep = kwargs[sep] if 'sep' in kwargs else ' '
    stdout = sys.stdout if 'file' not in kwargs else kwargs['file']
    stdout.write(sep.join(unicode(arg).encode('utf-8') for arg in args))
    stdout.write(end)

print(*symbols)
print('Success!')
with open('test.txt', 'wb') as testfile:
    print(*symbols, file=testfile)

That happily writes the byte encoded UTF-8 to the file (at least on my Ubuntu box here).

Community
  • 1
  • 1
Ivo Wetzel
  • 46,459
  • 16
  • 98
  • 112
  • encode does not function, it says only:Traceback (most recent call last): File "J:\test\symbol.py", line 20, in print(*symbols) File "J:\test\symbol.py", line 14, in print stdout.write(sep.join(str(arg).encode('utf8') for arg in args)) File "J:\test\symbol.py", line 14, in stdout.write(sep.join(str(arg).encode('utf8') for arg in args)) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2660' in position 0: ordinal not in range(128). Decode works but only in IDLE, not CMD console. – Tony Veijalainen Nov 20 '10 at 18:41
  • You haven't replace your `str(arg)` call by `unicode(arg)` so `encode` will fail on the non-unicode string. – Ivo Wetzel Nov 20 '10 at 18:43
  • Yes, but the result to file looks rubbish in IDLE "â™  ♥ ♦ ♣" – Tony Veijalainen Nov 20 '10 at 18:52
  • Sure it looks "rubbish", because now you got the byte representation of the UTF8 string. What did you expect? If you want to get the symbols back you need to read from the file and then use `.decode('utf-8')` – Ivo Wetzel Nov 20 '10 at 18:57
  • Also you should use `"wb"` instead of `"w"` as the second parameter to `open` to write in binary mode to the file, my apologies for missing that. – Ivo Wetzel Nov 20 '10 at 18:58
  • I need the symbols come out OK in windows text console, nothing else. The file part was just for reimplementation of file parameter to get the printing in IDLE work without unicode symbols like sys.stdout.write() – Tony Veijalainen Nov 20 '10 at 19:33
  • Then please state that clear in your question the next time, Windows CMD does **not** support UTF-8. I've updated my answer to reflect that. – Ivo Wetzel Nov 20 '10 at 19:51
  • But the consoles code pages at least cp850 supports the symbols also! See my Windows only brute force alternative I added in end. – Tony Veijalainen Nov 20 '10 at 19:56
  • The Windows console displays normally unprintable control characters (0x00-0x1f) as graphics characters. You can see that the code page itself doesn't support the Unicode characters for the card suits by typing `print u'\u2665'` (♥) and seeing the Unicode encoding error. But print the byte for CTRL-C `\x03` and you will see the heart. – Mark Tolonen Nov 21 '10 at 06:36
1

Use Unicode strings and the codecs module:

Either:

# coding: utf-8
from __future__ import print_function
import sys
import codecs

symbols = (u'♠',u'♥',u'♦',u'♣')

print(u' '.join(symbols))
print(*symbols)
with codecs.open('test.txt','w','utf-8') as testfile:
    print(*symbols, file=testfile)

or:

# coding: utf-8
from __future__ import print_function
from __future__ import unicode_literals
import sys
import codecs

symbols = ('♠','♥','♦','♣')

print(' '.join(symbols))
print(*symbols)
with codecs.open('test.txt','w','utf-8') as testfile:
    print(*symbols, file=testfile)

No need to re-implement print.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
0

UTF-8 in the Windows console is a long and painful story.

You can read issue 1602 and issue 6058 and have something that works, more or less, but it's fragile.

Let me summarise:

  • add 'cp65001' as an alias for 'utf8' in Lib/encodings/aliases.py
  • select Lucida Console or Consolas as your console font
  • run chcp 65001
  • run python
tzot
  • 92,761
  • 29
  • 141
  • 204
  • I know those, but have not managed to get console still to print anything inteligable. Maybe messed things sometimes in past trying to do same (my default is changed from registry co cp1252. Forgot where. Still no play card symbols) – Tony Veijalainen Nov 20 '10 at 19:45