10

I have a string with 50ish elements, I need to randomize this and generate a much longer string, I found random.sample() to only pick unique elements, which is great but not fit for my purpose, is there a way to allow repetitions in Python or do I need to manyally build a cycle?

S. W. G.
  • 433
  • 1
  • 6
  • 19

4 Answers4

18

You can use numpy.random.choice. It has an argument to specify how many samples you want, and an argument to specify whether you want replacement. Something like the following should work.

import numpy as np
choices = np.random.choice([1, 2, 3], size=10, replace=True)
# array([2, 1, 2, 3, 3, 1, 2, 2, 3, 2])

If your input is a string, say something like my_string = 'abc', you can use:

choices = np.random.choice([char for char in my_string], size=10, replace=True)
# array(['c', 'b', 'b', 'c', 'b', 'a', 'a', 'a', 'c', 'c'], dtype='<U1')

Then get a new string out of it with:

new_string = ''.join(choices)
# 'cbbcbaaacc'

Performance

Timing the three answers so far and random.choices from the comments (skipping the ''.join part since we all used it) producing 1000 samples from the string 'abc', we get:

  • numpy.random.choice([char for char in 'abc'], size=1000, replace=True):

    34.1 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

  • random.choices('abc', k=1000)

    269 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

  • [random.choice('abc') for _ in range(1000)]:

    924 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

  • [random.sample('abc',1)[0] for _ in range(1000)]:

    4.32 ms ± 67.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Numpy is fastest by far. If you put the ''.join parts in there, you actually see numpy and random.choices neck and neck, with both being three times faster than the next fastest for this example.

Engineero
  • 12,340
  • 5
  • 53
  • 75
  • did you test `random.choices` with `k` argument to indicate the lenght? – S. W. G. May 16 '18 at 21:57
  • 1
    I got `269 us +/- 6.43 us` per loop for `random.choices([char for char in 'abc'], k=1000)`. Numpy is still fastest. – Engineero May 17 '18 at 14:49
  • How did you measure the time taken and what does the `replace` argument do? – S. W. G. Sep 17 '21 at 00:33
  • 1
    @S.Redrum I measured time taken using the @timeit macro in a Jupyter notebook. It automates running a bunch of iterations of the same command and giving timing statistics. The `replace` argument just means that it samples with repetition, so it doesn't exhaust the list while it's sampling. I.e., elements can be repeated. – Engineero Sep 18 '21 at 20:27
0

You could do something like this:

import random
dict = 'abcdef'
''.join([random.choice(dict) for x in range(50)])
tremby
  • 9,541
  • 4
  • 55
  • 74
0

Not saying this is the most effective (you should prob. use choice here) ... but consider it:

import random
a = ['a','b','c']
' '.join([random.sample(a,1)[0] for _ in range(6)])
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
0

I have found this, I forgot to mention I was on Python 3.6:

DICTIONARY_NUMBERS_HEX = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F']
block_text = "".join(random.choices(DICTIONARY_NUMBERS_HEX,k=50)

Using k=50 named argument will generate repeated elements.

S. W. G.
  • 433
  • 1
  • 6
  • 19