58

For a project, I need a method of creating thousands of random strings while keeping collisions low. I'm looking for them to be only 12 characters long and uppercase only. Any suggestions?

Brandon
  • 2,886
  • 3
  • 29
  • 44
  • 3
    You mean you don't want any lowercase digits? – martineau Aug 19 '13 at 17:02
  • Hmm, yeah, that should be clarified :) – Maarten Bodewes Aug 19 '13 at 17:02
  • Don't forget to read this page about [the default random number generator in python](http://docs.python.org/2/library/random.html). The chance of collisions seems to be fully dependent on the size of the "random strings", but that does not mean that an attacker cannot re-create the random numbers; the random numbers generated are *not cryptographically secure*. – Maarten Bodewes Aug 19 '13 at 17:10
  • Hah, right. I meant alphanumeric. – Brandon Aug 20 '13 at 15:08

7 Answers7

134

CODE:

from random import choice
from string import ascii_uppercase

print(''.join(choice(ascii_uppercase) for i in range(12)))

OUTPUT:

5 examples:

QPUPZVVHUNSN
EFJACZEBYQEB
QBQJJEEOYTZY
EOJUSUEAJEEK
QWRWLIWDTDBD

EDIT:

If you need only digits, use the digits constant instead of the ascii_uppercase one from the string module.

3 examples:

229945986931
867348810313
618228923380
Bakudan
  • 19,134
  • 9
  • 53
  • 73
Peter Varo
  • 11,726
  • 7
  • 55
  • 77
  • 4
    yeah, well this is missleading: *"12 digits long and uppercase"* -- since digits can't be uppercased – Peter Varo Aug 19 '13 at 17:01
  • And if you need Alphanumeric i.e ASCII Uppercase plus digits then `import digits` `print(''.join(choice(ascii_uppercase + digits) for i in range(12)))` – Sandeep Kanabar Jan 05 '17 at 12:45
  • Does this gives an unique Id each time? What if I call this function from multiple threads (e.g. 2 of them) for 10000 times? What is the probability of collision or getting the same id at given point of time? – AnilJ Sep 06 '17 at 22:43
  • @AnilJ for further info on how the `random` module is working, please read the official documentation on it: https://docs.python.org/3/library/random.html – Peter Varo Sep 07 '17 at 07:44
  • Well, digits is not on Python3. You can use `string.hexdigits` to get a mix of '0123456789abcdefABCDEF', or just `string.digits + string.ascii_letters` for all letters. – goetz Oct 31 '17 at 01:20
  • @goetzc `string.digits` is in Python 3 all the way back till 3.0. – TheDiveO Jun 22 '18 at 18:53
  • @PeterVaro Few years late, but can you elaborate on that ? I do not understand how a digit can be uppercased. – Itération 122442 Jul 26 '21 at 15:09
23

By Django, you can use get_random_string function in django.utils.crypto module.

get_random_string(length=12,
    allowed_chars=u'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
    Returns a securely generated random string.

    The default length of 12 with the a-z, A-Z, 0-9 character set returns
    a 71-bit value. log_2((26+26+10)^12) =~ 71 bits

Example:

get_random_string()
u'ngccjtxvvmr9'

get_random_string(4, allowed_chars='bqDE56')
u'DDD6'

But if you don't want to have Django, here is independent code of it:

Code:

import random
import hashlib
import time

SECRET_KEY = 'PUT A RANDOM KEY WITH 50 CHARACTERS LENGTH HERE !!'

try:
    random = random.SystemRandom()
    using_sysrandom = True
except NotImplementedError:
    import warnings
    warnings.warn('A secure pseudo-random number generator is not available '
                  'on your system. Falling back to Mersenne Twister.')
    using_sysrandom = False


def get_random_string(length=12,
                      allowed_chars='abcdefghijklmnopqrstuvwxyz'
                                    'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'):
    """
    Returns a securely generated random string.

    The default length of 12 with the a-z, A-Z, 0-9 character set returns
    a 71-bit value. log_2((26+26+10)^12) =~ 71 bits
    """
    if not using_sysrandom:
        # This is ugly, and a hack, but it makes things better than
        # the alternative of predictability. This re-seeds the PRNG
        # using a value that is hard for an attacker to predict, every
        # time a random string is required. This may change the
        # properties of the chosen random sequence slightly, but this
        # is better than absolute predictability.
        random.seed(
            hashlib.sha256(
                ("%s%s%s" % (
                    random.getstate(),
                    time.time(),
                    SECRET_KEY)).encode('utf-8')
            ).digest())
    return ''.join(random.choice(allowed_chars) for i in range(length))
Omid Raha
  • 9,862
  • 1
  • 60
  • 64
4

Could make a generator:

from string import ascii_uppercase
import random
from itertools import islice

def random_chars(size, chars=ascii_uppercase):
    selection = iter(lambda: random.choice(chars), object())
    while True:
        yield ''.join(islice(selection, size))

random_gen = random_chars(12)
print next(random_gen)
# LEQIITOSJZOQ
print next(random_gen)
# PXUYJTOTHWPJ

Then just pull from the generator when they're needed... Either using next(random_gen) when you need them, or use random_200 = list(islice(random_gen, 200)) for instance...

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • 2
    And the advantage of using a generator for this would be? – martineau Aug 19 '13 at 17:10
  • @martineau can take one at a time, set up ones with different variables, can slice off to take n many at a time etc... The main difference is that it's in effect an iterable itself, instead of repeatedly calling a function... – Jon Clements Aug 19 '13 at 17:12
  • Why *wouldn't* you just repeatedly call a function? – user2357112 Aug 19 '13 at 17:50
  • `functools.partial` can fix parameters, and `list(itertools.islice(gen, n))` isn't any better than `[func() for _ in xrange(n)]` – user2357112 Aug 19 '13 at 17:58
  • @user2357112 by building a generator, there's an advantage over resuming its state, than setting up and calling up a function repeatedly... Also the `list` and `islice` will work at the implementation level instead of as a list-comp that could leak its `_` (in Py 2.x) variable and has to build an unnecessary range constraint that's otherwise handled... Also, it's also harder to build on top of functions, rather than streams... – Jon Clements Aug 19 '13 at 18:05
  • Resuming a generator's state vs calling a function repeatedly isn't an advantage, and if you want to set up fixed parameters, `functools.partial` can do that. The fact that `list` and `islice` are in C would be an advantage if there weren't a Python-level generator and several Python-level function calls in the inner loop. Leaking the loop variable is annoying, but no reason to avoid using list comprehensions. – user2357112 Aug 19 '13 at 18:14
  • If you use a generator, getting a single random string is `next(random_chars(n))`, whereas with a regular function it's just `random_chars(n)`. Looping over `k` random strings is `for s in islice(random_chars(n), k):`, whereas with a regular function, it's `for i in xrange(k): s = random_chars(n)`. I find the `islice` and `next` calls to be warning signs that you don't actually want a generator here. – user2357112 Aug 19 '13 at 18:19
  • @user2357112 depends on the use-case... I was just offering another option... If it's to associate a userid in a file (for instance) with a random password, then `dict(zip(fileobj, random_gen))` is perhaps better than using a dict comp with a call() as the value). If it's going to be arbitrarily used then I'd go for the approach already suggested, but what's the point of offering a duplicate answer ;) – Jon Clements Aug 19 '13 at 18:26
1
#!/bin/python3
import random
import string
def f(n: int) -> str:
        bytes(random.choices(string.ascii_uppercase.encode('ascii'),k=n)).decode('ascii')

run faster for very big n. avoid str concatenate.

0

For cryptographically strong pseudo-random bytes you might use the pyOpenSSL wrapper around OpenSSL.

It provides the bytes function to gather a pseudo-random sequences of bytes.

from OpenSSL import rand

b = rand.bytes(7)

BTW, 12 uppercase letters is a little bit more that 56 bits of entropy. You will only to have to read 7 bytes.

Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125
0

This function generates random string of UPPERCASE letters with the specified length,

eg: length = 6, will generate the following random sequence pattern

YLNYVQ

    import random as r

    def generate_random_string(length):
        random_string = ''
        random_str_seq = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        for i in range(0,length):
            if i % length == 0 and i != 0:
                random_string += '-'
            random_string += str(random_str_seq[r.randint(0, len(random_str_seq) - 1)])
        return random_string
Manoj Selvin
  • 2,247
  • 24
  • 20
  • With above code `random_str_seq = "ABC@#$%^!&_+|*()OPQRSTUVWXYZ"` can give you even more complex results. – Iqra. Jan 18 '19 at 11:39
0

A random generator function without duplicates using a set to store values which have been generated before. Note this will cost some memory with very large strings or amounts and it probably will slow down a bit. The generator will stop at a given amount or when the maximum possible combinations are reached.

Code:

#!/usr/bin/env python

from typing import Generator
from random import SystemRandom as RND
from string import ascii_uppercase, digits


def string_generator(size: int = 1, amount: int = 1) -> Generator[str, None, None]:
    """
    Return x random strings of a fixed length.

    :param size: string length, defaults to 1
    :type size: int, optional
    :param amount: amount of random strings to generate, defaults to 1
    :type amount: int, optional
    :yield: Yield composed random string if unique
    :rtype: Generator[str, None, None]
    """
    CHARS = list(ascii_uppercase + digits)
    LIMIT = len(CHARS) ** size
    count, check, string = 0, set(), ''
    while LIMIT > count < amount:
        string = ''.join(RND().choices(CHARS, k=size))
        if string not in check:
            check.add(string)
            yield string
            count += 1


for my_count, my_string in enumerate(string_generator(12, 20)):
    print(my_count, my_string)

Output:

0 IESUASWBRHPD
1 JGGO1THKLC9K
2 BW04A5GWBA7K
3 KDQTY72BV1S9
4 FAOL5L28VVMN
5 NLDNNBGHTRTI
6 2RV6TE6BCQ8K
7 B79B8FBPUD07
8 89VXXRHPUN41
9 DFC8QJUY6HRB
10 FXYYDKVQHC5Z
11 57KTZE67RSCU
12 389H1UT7N6CI
13 AKZMN9XITAVB
14 6T9ACH3GDAYG
15 CH8RJUQMTMBE
16 SPQ7E02ZLFD3
17 YD6JFXGIF3YF
18 ZUSA2X6OVNCN
19 JQRH6LR229Y4
FifthAxiom
  • 172
  • 1
  • 7