You can also use np.random.Generator.choice
.
df = pd.DataFrame(np.random.default_rng().choice(100, size=(100, 4)), columns=['A','B','C','D'])
The advantage of this method over integers
is that you can choose from any list / array you want. For example, if you want to generate random sample from [2, 5, 10]
, then
df = pd.DataFrame(np.random.default_rng().choice([2,5,10], size=(100, 4)), columns=['A','B','C','D'])
You can even associate a probability distribution to sample entries. For example, if you want to choose 2 with p=0.8, and 5 with p=0.2, you can do so by, passing p=
argument.
df = pd.DataFrame(np.random.default_rng().choice([2,5], p=[.8,.2], size=(100, 4)), columns=['A','B','C','D'])
Also, with the Generator
, choice
is as fast as integers
and faster than randint
.
%timeit pd.DataFrame(np.random.default_rng().choice(100, size=(100_000,4)), columns=[*'ABCD'])
# 3.34 ms ± 308 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.DataFrame(np.random.default_rng().integers(0, 100, size=(100_000,4)), columns=[*'ABCD'])
# 3.81 ms ± 708 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.DataFrame(np.random.randint(100, size=(100_000,4)), columns=[*'ABCD'])
# 6.78 ms ± 776 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)