-1

I simply want to print out all bytes from 0 to 255 as an ASCII character list string, such that if that byte cannot be decoded to an ASCII character, it is simply not shown in the list string.

I have seen How can we print a list of all printable ASCII characters to the console using python? - but it contains an answer where you manually "limit" the byte values considered for printing in the "for" loop to the "printable" ASCII values - and I would like to "outsorce" that to the Python3 encoder/decoder, if possible?

An example: as a first attempt, this works:

$ python3 -c 'arr=[ix for ix in range(256)]; print(bytes(arr))'
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./
0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\
x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\x
b1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd
4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7
\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'

... except I get \x** for the "unprintable characters", whereas I'd want them to be simply left out.

So I tried .decode:

$ python3 -c 'arr=[ix for ix in range(256)]; print(bytes(arr).decode())'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte

$ python3 -c 'arr=[ix for ix in range(256)]; print(bytes(arr).decode("ascii"))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 128: ordinal not in range(128)

In desperation, I tried too:

$ python3 -c 'arr=[ix for ix in range(256)]; print(bytes(arr).encode("ascii"))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'encode'. Did you mean: 'decode'?

So, of course I could try piping the output of python3 -c 'arr=[ix for ix in range(256)]; print(bytes(arr))' into some regex that will filter out \x(character)(character), as in:

$ \python3 -c 'arr=[ix for ix in range(256)]; print(bytes(arr))' | sed 's/\\x..//g'
b'\t\n\r !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~'

... but that seems an overkill - I should be able to set up encode/decode to give me this directly from Python, shouldn't I?

So, what do I need to do with encode/decode, to print out a string with only the printable ASCII characters, from a byte array with all of the 8-bit (byte) values?

sdbbs
  • 4,270
  • 5
  • 32
  • 87
  • 1
    1) ASCII only goes up to 127, so you don't need to go up to 256 at all. This is known and doesn't need to be tested programatically. 2) There's a difference between "can be interpreted as ASCII" and "is printable". Every byte from 0 to 127 is valid ASCII. Not every character in the range 0 to 127 is printable per se. You only want to check for the "printability" of characters. – deceze Aug 31 '23 at 04:30
  • `$ python3 -c 'import string; print(string.printable)'`… – deceze Aug 31 '23 at 04:33
  • 1
    For completeness: `bytes(arr).decode()` fails because it's implicitly trying to decode the bytes as UTF-8, and they're simply not valid UTF-8. `bytes(arr).decode("ascii")` fails because, again, ASCII only defines bytes up to 127, and it cannot decode anything in the range 128-255. `bytes(arr).encode("ascii")` fails because bytes are already encoded and encoding them again has no meaning (the type doesn't even define that method). – deceze Aug 31 '23 at 04:37
  • 1
    Encoding/decoding isn't the solution in the first place, as you want to _exclude_ "unprintable" characters, not just present them in some other form. – deceze Aug 31 '23 at 04:38
  • Thanks @deceze - "Encoding/decoding isn't the solution" seems to be the right answer, I've lost sight of that - however, I wouldn't have been able to grok that without the "for completeness" comment as well; so all the comments are much appreciated! – sdbbs Aug 31 '23 at 05:03

0 Answers0