Sampling with repetition in Python

Question

I have a string with 50ish elements, I need to randomize this and generate a much longer string, I found random.sample() to only pick unique elements, which is great but not fit for my purpose, is there a way to allow repetitions in Python or do I need to manyally build a cycle?

@AntonvBR it picks uniques, so if my list lenght is K and I need any number greater than K it gives an error, does not allor repetition. — S. W. G., May 08 '18 at 21:01
If one of the provided answers works for you, please mark it as accepted. — Engineero, May 09 '18 at 16:06
May anybody explain me why is it in fact called Sampling with replacement instead of Sampling with repetition? — scarface, Dec 28 '20 at 15:00

Engineero · Accepted Answer · 2021-08-26T15:28:04.380

You can use numpy.random.choice. It has an argument to specify how many samples you want, and an argument to specify whether you want replacement. Something like the following should work.

import numpy as np
choices = np.random.choice([1, 2, 3], size=10, replace=True)
# array([2, 1, 2, 3, 3, 1, 2, 2, 3, 2])

If your input is a string, say something like my_string = 'abc', you can use:

choices = np.random.choice([char for char in my_string], size=10, replace=True)
# array(['c', 'b', 'b', 'c', 'b', 'a', 'a', 'a', 'c', 'c'], dtype='<U1')

Then get a new string out of it with:

new_string = ''.join(choices)
# 'cbbcbaaacc'

Performance

Timing the three answers so far and random.choices from the comments (skipping the ''.join part since we all used it) producing 1000 samples from the string 'abc', we get:

numpy.random.choice([char for char in 'abc'], size=1000, replace=True):

34.1 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
random.choices('abc', k=1000)

269 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
[random.choice('abc') for _ in range(1000)]:

924 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
[random.sample('abc',1)[0] for _ in range(1000)]:

4.32 ms ± 67.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Numpy is fastest by far. If you put the ''.join parts in there, you actually see numpy and random.choices neck and neck, with both being three times faster than the next fastest for this example.

did you test `random.choices` with `k` argument to indicate the lenght? — S. W. G., May 16 '18 at 21:57
I got `269 us +/- 6.43 us` per loop for `random.choices([char for char in 'abc'], k=1000)`. Numpy is still fastest. — Engineero, May 17 '18 at 14:49
How did you measure the time taken and what does the `replace` argument do? — S. W. G., Sep 17 '21 at 00:33
@S.Redrum I measured time taken using the @timeit macro in a Jupyter notebook. It automates running a bunch of iterations of the same command and giving timing statistics. The `replace` argument just means that it samples with repetition, so it doesn't exhaust the list while it's sampling. I.e., elements can be repeated. — Engineero, Sep 18 '21 at 20:27

score 0 · Answer 2 · answered May 08 '18 at 21:02

0

You could do something like this:

import random
dict = 'abcdef'
''.join([random.choice(dict) for x in range(50)])

answered May 08 '18 at 21:02

tremby

9,541
4
55
74

score 0 · Answer 3 · answered May 08 '18 at 21:02

0

Not saying this is the most effective (you should prob. use choice here) ... but consider it:

import random
a = ['a','b','c']
' '.join([random.sample(a,1)[0] for _ in range(6)])

answered May 08 '18 at 21:02

Anton vBR

18,287
5
40
46

score 0 · Answer 4 · answered May 10 '18 at 10:31

I have found this, I forgot to mention I was on Python 3.6:

DICTIONARY_NUMBERS_HEX = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F']
block_text = "".join(random.choices(DICTIONARY_NUMBERS_HEX,k=50)

Using k=50 named argument will generate repeated elements.

Sampling with repetition in Python

4 Answers4

Performance

Linked