-4

I wanted to loop over Unicode-Characters in Python like this:

hex_list = "012346789abcdef"
for _1 in hex_list:
    for _2 in hex_list:
        for _3 in hex_list:
            for _4 in hex_list:
                my_char = r"\u" + _1 + _2 + _3 + _4
                print(my_char)

As expected this printed out:

\u0000
\u0001
...
\uffff

Then I tried to change the code above to print not the Unicode but the corresponding Characters:

hex_list = "012346789abcdef"
for _1 in hex_list:
    for _2 in hex_list:
        for _3 in hex_list:
            for _4 in hex_list:
                my_char = r"\u" + _1 + _2 + _3 + _4
                eval("print(my_char)")

But this outputs the same as the code before.

hex_list = "012346789abcdef"
for _1 in hex_list:
    for _2 in hex_list:
        for _3 in hex_list:
            for _4 in hex_list:
                eval("print(" + r"\u" + f"{_1}{_2}{_3}{_4})")

And something like this raises following errow message:

eval("print(" + r"\u" + f"{_1}{_2}{_3}{_4})")
  File "<string>", line 1
    print(\u0000)
                ^
SyntaxError: unexpected character after line continuation character

What would make this code work as intended?

wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • 2
    Fiddling with `eval`ing string literals smells like an [XY problem](https://meta.stackexchange.com/q/66377/478746). Why not use `chr(codepoint)`? – Brian61354270 Feb 21 '23 at 15:25
  • @Brian To be clear, `codepoint` needs to be an int, which can be got with `int(f"{_1}{_2}{_3}{_4})", 16)` – wjandrea Feb 21 '23 at 15:27
  • 1
    Python strings are Unicode. All characters are Unicode characters. Unicode isn't some kind of escape sequence, it's a way of mapping characters to bytes. – Panagiotis Kanavos Feb 21 '23 at 15:27
  • Also, note that `eval("print(my_char)")` is the same as `print(my_char)` it's just printing the string contents of the variable `my_char` – Brian61354270 Feb 21 '23 at 15:27
  • Why are you using nested loops in the first place when you could just be looping over numbers? `for codepoint in range(0xffff): ...`. Or you could at least use [`product`](https://docs.python.org/3/library/itertools.html#itertools.product) instead of a nested loop. – wjandrea Feb 21 '23 at 15:28
  • The error is telling you that the *escape sequence* you constructed is invalid. It says nothing about the NUL character you tried to create – Panagiotis Kanavos Feb 21 '23 at 15:28
  • 2
    Given the *fact* that Python strings are Unicode, you can use [chr](https://docs.python.org/3/library/functions.html#chr) to convert a Unicode code point to a string with that character, eg `print(chr(1081))`. You can iterate from `0` to whatever number you want to generate characters – Panagiotis Kanavos Feb 21 '23 at 15:31
  • "Mandatory" background reading: [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/) – Brian61354270 Feb 21 '23 at 15:35
  • Why are you expecting `\u0000` to work? Strings need to be quoted, i.e. `'\u0000'`. Did you just forget to add the quote marks? `eval(fr"print('\u{_1}{_2}{_3}{_4}')")` – wjandrea Feb 21 '23 at 15:35
  • 1
    You aren't iterating over Unicode characters in the original code. You are iterating over regular ASCII characters and constructing strings that look like escape sequences used to indicate Unicode characters in string literals. Two *very* different things. – chepner Feb 21 '23 at 15:36
  • Does this answer your question? [Process escape sequences in a string in Python](https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python) – Abdul Aziz Barkat Feb 21 '23 at 15:55

2 Answers2

-1

Python strings are Unicode already. Unicode isn't some kind of escape sequence, it's a way of mapping characters to bytes.

Given that fact, you can use chr to convert a Unicode code point to a string with that character, eg print(chr(1081)). As the function's docs say:

Return the string representing a character whose Unicode code point is the integer i. For example, chr(97) returns the string 'a', while chr(8364) returns the string '€'. This is the inverse of ord().

The valid range for the argument is from 0 through 1,114,111

A simple loop can generate all valid characters. Actually printing them is another matter:

for i in range(0, 1114112 ):
    print(chr(i))

Running this on my machine eventually fails with

UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed

That value couldn't be converted in a form that can be printed on my terminal, which uses UTF8

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
-1

I'd recommend using itertools in this case, and then bytearray.fromhex (as shown here), e.g.,

from itertools import product

for comb in product("012346789abcdef", repeat=4):
    print(bytearray.fromhex(rf"{''.join(comb)}").decode())

although this raises the same error as in @Panagiotis's answer. To get round the error you can use a try... except... block, e.g.:

for comb in product("012346789abcdef", repeat=4):
    try:
        print(bytearray.fromhex(rf"{''.join(comb)}").decode())       
    except UnicodeDecodeError:
        pass
Matt Pitkin
  • 3,989
  • 1
  • 18
  • 32