44

If I call os.urandom(64), I am given 64 random bytes. With reference to Convert bytes to a Python string I tried

a = os.urandom(64)
a.decode()
a.decode("utf-8")

but got the traceback error stating that the bytes are not in utf-8.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 0: invalid start byte

with the bytes

b'\x8bz\xaf$\xb6\x93q\xef\x94\x99$\x8c\x1eO\xeb\xed\x03O\xc6L%\xe70\xf9\xd8
\xa4\xac\x01\xe1\xb5\x0bM#\x19\xea+\x81\xdc\xcb\xed7O\xec\xf5\\}\x029\x122
\x8b\xbd\xa9\xca\xb2\x88\r+\x88\xf0\xeaE\x9c'

Is there a fullproof method to decode these bytes into some string representation? I am generating sudo random tokens to keep track of related documents across multiple database engines.

Community
  • 1
  • 1
user1876508
  • 12,864
  • 21
  • 68
  • 105
  • Odd way of doing it... Why not just have a more "central" db that generates its own ID, which refers to the other IDs...? Or, instead of using `urandom` - why not use a uuid4 or similar? – Jon Clements Jul 30 '13 at 23:24
  • Can this be also used to generate a random seed? – Charlie Parker Jul 03 '16 at 20:58
  • Django's generate random string logic. https://github.com/django/django/blob/master/django/utils/crypto.py#L51 – bgth Jan 27 '17 at 15:29

6 Answers6

77

The code below will work on both Python 2.7 and 3:

from base64 import b64encode
from os import urandom

random_bytes = urandom(64)
token = b64encode(random_bytes).decode('utf-8')
Chen A.
  • 10,140
  • 3
  • 42
  • 61
user1876508
  • 12,864
  • 21
  • 68
  • 105
  • you don't need to decode it only b64encode is enough. >>> rstring = os.urandom(16) >>> rstring b'\xaf\xec)Uf\x1fb\x8dQ\xfa\xc0\x95\x9c\xd1T\x97' >>> b64encode(rstring) b'r+wpVWYfYo1R+sCVnNFUlw==' >>> b64encode(rstring).decode('utf-8') 'r+wpVWYfYo1R+sCVnNFUlw==' – Muneeb Ejaz May 28 '22 at 15:36
14

You have random bytes; I'd be very surprised if that ever was decodable to a string.

If you have to have a unicode string, decode from Latin-1:

a.decode('latin1')

because it maps bytes one-on-one to corresponding Unicode code points.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
10

You can use base-64 encoding. In this case:

a = os.urandom(64)
a.encode('base-64')

Also note that I'm using encode here rather than decode, as decode is trying to take it from whatever format you specify into unicode. So in your example, you're treating the random bytes as if they form a valid utf-8 string, which is rarely going to be the case with random bytes.

Rob Watts
  • 6,866
  • 3
  • 39
  • 58
3

Are you sure that you need 64 bytes represented as string?

Maybe what you really need is N-bits token? If so, use secrets. The secrets module provides functions for generating secure tokens, suitable for applications such as password resets, hard-to-guess URLs, and similar.

import secrets

>>> secrets.token_bytes(16)  
b'\xebr\x17D*t\xae\xd4\xe3S\xb6\xe2\xebP1\x8b'

>>> secrets.token_hex(16)  
'f9bf78b9a18ce6d46a0cd2b0b86df9da'

>>> secrets.token_urlsafe(16)  
'Drmhze6EPcv0fN_81Bj-nA'

Or Maybe you need 64 chars length random string? import string

import secrets
alphabet = string.ascii_letters + string.digits
password = ''.join(secrets.choice(alphabet) for i in range(64))
Alexander C
  • 3,597
  • 1
  • 23
  • 39
1

this easy way:

a = str(os.urandom(64))
print(F"the: {a}")
print(type(a))
Ahmad Al ALloush
  • 330
  • 3
  • 10
0

When trying to encode / decode using the codec using urandom(16) I would get a UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte

This is what I ended up doing.

import os
import binascii

a = binascii.hexlify(os.urandom(32)).decode()
print(a)

'fd78f19c8bdcd7bc086d5a34b8d0ebccbd501fd2eea18e46699bb52efa48ac3c'
ElJeffe
  • 637
  • 1
  • 8
  • 20