0

I have made a Python script to generate random binary strings, then convert those into ASCII.

from random import *

def decode(binary):
    # credit to mhawke from stack overflow
    return ''.join(chr(int(binary[i*8:i*8+8],2)) for i in range(len(binary)//8))
    
def generate_random_binary(chars=None):
    if not chars:
        chars = randint(10, 20)
    r = ''
    for i in range(chars * 5):
        num = choice(['0', '1'])
        r = r + num
    return r

generated = generate_random_binary(64)
decoded = decode(generated)
decoded = eval("r'''"+decoded.replace('\0', '')+"'''")
print(decoded.replace('\n', ''))

Sometimes I get output with newlines, even though I tried to remove those. Is there something I'm missing?

Here's an example of my output (ran in Repl.it):

ìîù_½   Ý05!d­(óÞÉ|½b°L³µ¬
                          H}N¸'Ä
  • 1
    I think the question needs some clarification. usually, when you generate a 'random binary string', you do not try to print it. If you want to generate a random string of printable characters, that is a different problem. Could you clarify if you want to generate a "random binary string" for printng or a "random string of characters that can be printed, exclusive of CR, LF, BEL, and non-printable characters."? – Gardener Nov 19 '20 at 16:38
  • I'm only trying to print it so I can guarantee that it's working. I think you can figure out what I'm trying to do by looking at the code. –  Nov 19 '20 at 17:31
  • That makes sense. If you are printing it for debugging purposes, then it is best to map all non-printable characters and the CR, CF characters to a '.' or ' ' before passing the string to print(). this is what is commonly done when printing a hex dump with a parallel printing of the actual characters. – Gardener Nov 19 '20 at 19:38
  • @Gardener: For debugging purposes, I'd just `print(repr(decoded))`; that'll show you the `str` literal form with string escapes instead of special characters like newlines and carriage returns and what have you. In this case, they should really be making a `bytes` object anyway (that's how you store raw random bytes) and it would print its own `repr` by default. The construction would be a little simpler too: `return bytes(int(binary[i:i+8], 2) for i in range(0, len(binary), 8))` (I took the liberty of removing the need for multiplication entirely). – ShadowRanger Nov 19 '20 at 20:59
  • @ShadowRanger Using `repr()` is a great way to go! Much better than my beginner's solution. The output on my mahcine: `'\x05\x07Òo§ô\x88x\x07\x15k~OÄÕ-z\x90[ifô«v\x1cQpew]ö`[ô5\x9b\xad\x0fÞè'` I suppose, this allows you to see the hex values of the non-printables. If you were printing a hex dump with a mapped column, this might be a little hard on the formatting, but repr() is a lot better and faster. – Gardener Nov 19 '20 at 21:10

2 Answers2

0

I think it's LF and CR. \n only replaces LF.

See here http://www.asciitable.com/

anch2150
  • 81
  • 6
  • Even with using replace('\r', ''), I got this: `hVÑh xk< ËrÁ»:bð9£^ÒNéØÐ 9 rq22¨ÕV` (Four lines) –  Nov 19 '20 at 17:29
  • 1
    @Bradley: Print the `repr` of the string as well as the string itself. You'll be able to see the string escapes that correspond to the apparent newline(s). There are other ASCII characters that some terminals might interpret as newlines (e.g. form feed, vertical tab). I strongly suspect this is [an XY problem](https://meta.stackexchange.com/q/66377/322040), and you should really just be using `random.choices` on a sequence of all the character you want to accept, rather than making random bytes and discarding the ones that you don't want. – ShadowRanger Nov 19 '20 at 20:56
0

I think it best to replace the non-printable characters with a '.' or a space ' '.

from random import *
import unicodedata

printable = {'Lu', 'Ll'}
def replace_nonprintable_with_period(str):
  return ''.join(c  if unicodedata.category(c) in printable else '.' 
                 for c in str)
#credit to https://stackoverflow.com/a/93557/4983398

def decode(binary):
    # credit to mhawke from stack overflow
    return ''.join(chr(int(binary[i * 8:i * 8 + 8], 2)) for i in range(len(binary) // 8))


def generate_random_binary(chars=None):
    if not chars:
        chars = randint(10, 20)
    r = ''
    for i in range(chars * 5):
        num = choice(['0', '1'])
        r = r + num
    return r


generated = generate_random_binary(64)
decoded = decode(generated)
decoded = eval("r'''" + decoded.replace('\0', '') + "'''")
print(replace_nonprintable_with_period(decoded))

Sample output:

.Ä.Ô..WE..nuO.Ä...çò.Á.......Àã..c...Ñ..

unicode.category is a very helpful feature of unicode and Python. See this article for more details.

Nimantha
  • 6,405
  • 6
  • 28
  • 69
Gardener
  • 2,591
  • 1
  • 13
  • 22