I have a string with 50ish elements, I need to randomize this and generate a much longer string, I found random.sample()
to only pick unique elements, which is great but not fit for my purpose, is there a way to allow repetitions in Python or do I need to manyally build a cycle?

- 433
- 1
- 6
- 19
-
Why doesnt random.sample() fit your purpose exactly? – Anton vBR May 08 '18 at 20:59
-
@AntonvBR it picks uniques, so if my list lenght is K and I need any number greater than K it gives an error, does not allor repetition. – S. W. G. May 08 '18 at 21:01
-
1If one of the provided answers works for you, please mark it as accepted. – Engineero May 09 '18 at 16:06
-
1May anybody explain me why is it in fact called Sampling with replacement instead of Sampling with repetition? – scarface Dec 28 '20 at 15:00
4 Answers
You can use numpy.random.choice
. It has an argument to specify how many samples you want, and an argument to specify whether you want replacement. Something like the following should work.
import numpy as np
choices = np.random.choice([1, 2, 3], size=10, replace=True)
# array([2, 1, 2, 3, 3, 1, 2, 2, 3, 2])
If your input is a string, say something like my_string = 'abc'
, you can use:
choices = np.random.choice([char for char in my_string], size=10, replace=True)
# array(['c', 'b', 'b', 'c', 'b', 'a', 'a', 'a', 'c', 'c'], dtype='<U1')
Then get a new string out of it with:
new_string = ''.join(choices)
# 'cbbcbaaacc'
Performance
Timing the three answers so far and random.choices
from the comments (skipping the ''.join
part since we all used it) producing 1000 samples from the string 'abc'
, we get:
numpy.random.choice([char for char in 'abc'], size=1000, replace=True)
:34.1 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
random.choices('abc', k=1000)
269 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
[random.choice('abc') for _ in range(1000)]
:924 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
[random.sample('abc',1)[0] for _ in range(1000)]
:4.32 ms ± 67.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Numpy is fastest by far. If you put the ''.join
parts in there, you actually see numpy and random.choices
neck and neck, with both being three times faster than the next fastest for this example.

- 12,340
- 5
- 53
- 75
-
did you test `random.choices` with `k` argument to indicate the lenght? – S. W. G. May 16 '18 at 21:57
-
1I got `269 us +/- 6.43 us` per loop for `random.choices([char for char in 'abc'], k=1000)`. Numpy is still fastest. – Engineero May 17 '18 at 14:49
-
How did you measure the time taken and what does the `replace` argument do? – S. W. G. Sep 17 '21 at 00:33
-
1@S.Redrum I measured time taken using the @timeit macro in a Jupyter notebook. It automates running a bunch of iterations of the same command and giving timing statistics. The `replace` argument just means that it samples with repetition, so it doesn't exhaust the list while it's sampling. I.e., elements can be repeated. – Engineero Sep 18 '21 at 20:27
You could do something like this:
import random
dict = 'abcdef'
''.join([random.choice(dict) for x in range(50)])

- 9,541
- 4
- 55
- 74
Not saying this is the most effective (you should prob. use choice here) ... but consider it:
import random
a = ['a','b','c']
' '.join([random.sample(a,1)[0] for _ in range(6)])

- 18,287
- 5
- 40
- 46
I have found this, I forgot to mention I was on Python 3.6:
DICTIONARY_NUMBERS_HEX = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F']
block_text = "".join(random.choices(DICTIONARY_NUMBERS_HEX,k=50)
Using k=50
named argument will generate repeated elements.

- 433
- 1
- 6
- 19