I am trying to create a list of 10,000 random numbers between 1 and 1000. But I want 80-85% of the numbers to be the same category( I mean some 100 numbers out of these should appear 80% of the times in the list of random numbers) and the rest appear around 15-20% of the times. Any idea if this can be done in Python/NumPy/SciPy. Thanks.
Asked
Active
Viewed 1,123 times
1
-
Can you be more specific? Are these 100 numbers placed consecutively? – martianwars Dec 24 '16 at 06:20
-
No. They can be spread out anyway but should be between 1 and 1000. – Subhankar Ghosh Dec 24 '16 at 06:21
-
Why do you need a special function then? Just use `random.randint()` 2 times. First time to select those 100 vs rest and next time to choose among them – martianwars Dec 24 '16 at 06:22
-
How are the numbers categorized? – wwii Dec 24 '16 at 07:06
-
1Possible duplicate of [A weighted version of random.choice](http://stackoverflow.com/questions/3679694/a-weighted-version-of-random-choice) – wwii Dec 24 '16 at 07:22
2 Answers
2
This can be easily done using 1 call to random.randint()
to select a list and another call to random.choice()
on the correct list. I'll assume list frequent
contain 100 elements which are to be chose 80
percent times and rare
contains 900
elements to be chose 20
percent times.
import random
a = random.randint(1,5)
if a == 1:
# Case for rare numbers
choice = random.choice(rare)
else:
# case for frequent numbers
choice = random.choice(frequent)

martianwars
- 6,380
- 5
- 35
- 44
1
Here's an approach -
a = np.arange(1,1001) # Input array to extract numbers from
# Select 100 random unique numbers from input array and get also store leftovers
p1 = np.random.choice(a,size=100,replace=0)
p2 = np.setdiff1d(a,p1)
# Get random indices for indexing into p1 and p2
p1_idx = np.random.randint(0,p1.size,(8000))
p2_idx = np.random.randint(0,p2.size,(2000))
# Index and concatenate and randomize their positions
out = np.random.permutation(np.hstack((p1[p1_idx], p2[p2_idx])))
Let's verify after run -
In [78]: np.in1d(out, p1).sum()
Out[78]: 8000
In [79]: np.in1d(out, p2).sum()
Out[79]: 2000

Divakar
- 218,885
- 19
- 262
- 358