Printing all unicode characters in Python

Question

I've written some code to create all 4-digit combinations of the hexidecimal system, and now I'm trying to use that to print out all the unicode characters that are associated with those values. Here's the code I'm using to do this:

char_list =["0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F"]
pairs = []
all_chars = []

# Construct pairs list
for char1 in char_list:
    for char2 in char_list:
        pairs.append(char1 + char2)

# Create every combination of unicode characters ever
    for pair1 in pairs:
        for pair2 in pairs:
            all_chars.append(pair1 + pair2)

# Print all characters
for code in all_chars:
    expression = "u'\u" + code + "'"
    print "{}: {}".format(code,eval(expression))

And here is the error message I'm getting:

Traceback (most recent call last): File "C:\Users\andr7495\Desktop\unifun.py", 
line 18, in <module> print "{}: {}".format(code,eval(expression))
UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position 0: 
ordinal not in range(128)

The exception is thrown when the code tries to print u"\u0080", however, I can do this in the interactive interpreter without a problem.

I've tried casting the results to unicode and specifying to ignore errors, but it's not helping. I feel like I'm missing a basic understanding about how unicode works, but is there anything I can do to get my code to print out all valid unicode expressions?

`u"\u0080"` is the [control character](http://www.fileformat.info/info/unicode/char/0080/index.htm)... maybe you can't print that? — tmdavison, Oct 09 '15 at 16:39
@tom especially if he is printing to the standard windows cmd.exe prompt :P — Joran Beasley, Oct 09 '15 at 16:40
unrelated: to display characters outside a `chcp` encoding range in Windows console, install `win-unicode-console` package. See [Python, Unicode, and the Windows console](http://stackoverflow.com/a/32176732/4279) — jfs, Oct 10 '15 at 08:14
Possible duplicate of [How can I print all unicode characters?](https://stackoverflow.com/questions/7959740/how-can-i-print-all-unicode-characters) — Alex Hall, Aug 10 '17 at 16:48

Michał Šrajer · Answer 1 · 2015-10-09T17:08:54.840

15

import sys
for i in xrange(sys.maxunicode): 
  print unichr(i);

edited Oct 09 '15 at 17:08

answered Oct 09 '15 at 16:41

Michał Šrajer

30,364
7
62
85

1

or even better: `sys.maxunicode + 1` (to treat `U+10FFFF` non-character as other non-characters). – jfs Oct 10 '15 at 08:12
On my system (Mac) this displays many of the same glyph that means "this fon't doesn't have that glyph in this codepage" (YMMV on how or whether that character even displays in your browser: on firefox on Mac that's printing as a question mark in a block; on firefox on windows it displays as hex digits in a block), with very many other unique printable glyphs. How would I filter for glyphs that don't exist in the current display font + code page? I could only imagine a custom-coded solution (executable) using freetype :/ – Alex Hall Aug 10 '17 at 16:51

score 0 · Answer 2 · answered Oct 09 '15 at 16:38

it is likely a problem with your terminal (cmd.exe is notoriously bad at this) as most of the time when you "print" you are printing to a terminal and that ends up trying to do encodings ... if you run your code in idle or some other space that can render unicode you should see the characters. also you should not use eval try this

for uni_code in range(...):
    print hex(uni_code),unichr(uni_code)

score 0 · Accepted Answer · answered Oct 09 '15 at 17:29

You're trying to format a Unicode character into a byte string. You can remove the error by using a Unicode string instead:

print u"{}: {}".format(code,eval(expression))
      ^

The other answers are better at simplifying the original problem however, you're definitely doing things the hard way.

score 0 · Answer 4 · edited Aug 03 '17 at 22:39

0

Here's a rewrite of examples in this article that saves the list to a file.

Python 3.x:

import sys 
txtfile = "unicode_table.txt"
print("creating file: " + txtfile) 
F = open(txtfile, "w", encoding="utf-16", errors='ignore')
for uc in range(sys.maxunicode):
    line = "%s %s" % (hex(uc), chr(uc))
    print(line, file=F)
F.close()

edited Aug 03 '17 at 22:39

answered Jun 09 '17 at 12:04

Bimo

5,987
2
39
61

Printing all unicode characters in Python

4 Answers4

Linked