3

I've written some code to create all 4-digit combinations of the hexidecimal system, and now I'm trying to use that to print out all the unicode characters that are associated with those values. Here's the code I'm using to do this:

char_list =["0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F"]
pairs = []
all_chars = []

# Construct pairs list
for char1 in char_list:
    for char2 in char_list:
        pairs.append(char1 + char2)

# Create every combination of unicode characters ever
    for pair1 in pairs:
        for pair2 in pairs:
            all_chars.append(pair1 + pair2)

# Print all characters
for code in all_chars:
    expression = "u'\u" + code + "'"
    print "{}: {}".format(code,eval(expression))

And here is the error message I'm getting:

Traceback (most recent call last): File "C:\Users\andr7495\Desktop\unifun.py", 
line 18, in <module> print "{}: {}".format(code,eval(expression))
UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position 0: 
ordinal not in range(128)

The exception is thrown when the code tries to print u"\u0080", however, I can do this in the interactive interpreter without a problem.

I've tried casting the results to unicode and specifying to ignore errors, but it's not helping. I feel like I'm missing a basic understanding about how unicode works, but is there anything I can do to get my code to print out all valid unicode expressions?

Automatic Bazooty
  • 502
  • 2
  • 6
  • 13
  • try to avoid eval, especially in loop – Michał Šrajer Oct 09 '15 at 16:36
  • `u"\u0080"` is the [control character](http://www.fileformat.info/info/unicode/char/0080/index.htm)... maybe you can't print that? – tmdavison Oct 09 '15 at 16:39
  • @tom especially if he is printing to the standard windows cmd.exe prompt :P – Joran Beasley Oct 09 '15 at 16:40
  • unrelated: to display characters outside a `chcp` encoding range in Windows console, install `win-unicode-console` package. See [Python, Unicode, and the Windows console](http://stackoverflow.com/a/32176732/4279) – jfs Oct 10 '15 at 08:14
  • Possible duplicate of [How can I print all unicode characters?](https://stackoverflow.com/questions/7959740/how-can-i-print-all-unicode-characters) – Alex Hall Aug 10 '17 at 16:48

4 Answers4

15
import sys
for i in xrange(sys.maxunicode): 
  print unichr(i);
Michał Šrajer
  • 30,364
  • 7
  • 62
  • 85
  • 1
    or even better: `sys.maxunicode + 1` (to treat `U+10FFFF` non-character as other non-characters). – jfs Oct 10 '15 at 08:12
  • On my system (Mac) this displays many of the same glyph that means "this fon't doesn't have that glyph in this codepage" (YMMV on how or whether that character even displays in your browser: on firefox on Mac that's printing as a question mark in a block; on firefox on windows it displays as hex digits in a block), with very many other unique printable glyphs. How would I filter for glyphs that don't exist in the current display font + code page? I could only imagine a custom-coded solution (executable) using freetype :/ – Alex Hall Aug 10 '17 at 16:51
0

it is likely a problem with your terminal (cmd.exe is notoriously bad at this) as most of the time when you "print" you are printing to a terminal and that ends up trying to do encodings ... if you run your code in idle or some other space that can render unicode you should see the characters. also you should not use eval try this

for uni_code in range(...):
    print hex(uni_code),unichr(uni_code)
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
0

You're trying to format a Unicode character into a byte string. You can remove the error by using a Unicode string instead:

print u"{}: {}".format(code,eval(expression))
      ^

The other answers are better at simplifying the original problem however, you're definitely doing things the hard way.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
0

Here's a rewrite of examples in this article that saves the list to a file.

Python 3.x:

import sys 
txtfile = "unicode_table.txt"
print("creating file: " + txtfile) 
F = open(txtfile, "w", encoding="utf-16", errors='ignore')
for uc in range(sys.maxunicode):
    line = "%s %s" % (hex(uc), chr(uc))
    print(line, file=F)
F.close()
Bimo
  • 5,987
  • 2
  • 39
  • 61