Getting weighted random values from a list of lists with different list lengths

Question

I need to create a new list that has random values pulled from a list of lists, where the secondary lists may be of different lengths.

Also, I need to take into account that, for example, if one of the secondary lists is larger than the rest, then the probabilities of obtaining a value from said list must be higher than that of the shorter secondary lists. Random values may be selected more than once, meaning I don't have to remove it from the list of lists after being chosen.

I was able to create the list of lists, where each secondary list corresponds to a region and its contents corresponds to client codes randomly generated, so far so good. But, when I use the function random.choice() to create my new list with random values, I get x amount of random lists from the lists available, rather than random values picked from ALL lists.

thislist = []

# So I have my blank list and I am ready to populate the list with, 
# in this case, 10 random values from the list of lists named 'codigo_cliente'

for i in range(10):
    thislist.append(random.choice(codigo_cliente))

Here are the client codes with 30 total clients in this example:

Clients Codes:

[['A-336', 'A-437', 'A-720', 'A-233', 'A-499'], 
 ['B-664', 'B-133', 'B-267', 'B-421', 'B-553', 'B-910', 'B-792', 'B-719', 'B-550', 'B-946'], 
 ['C-755', 'C-533', 'C-596', 'C-877', 'C-400', 'C-354', 'C-471', 'C-169', 'C-329', 'C-318', 'C-550', 'C-422', 'C-251', 'C-852', 'C-309']]

I am getting the following output, which is not what I want:

This is the random list of clients selected:

[['B-664', 'B-133', 'B-267', 'B-421', 'B-553', 'B-910', 'B-792', 'B-719', 'B-550', 'B-946'], 
 ['A-336', 'A-437', 'A-720', 'A-233', 'A-499'], 
 ['C-755', 'C-533', 'C-596', 'C-877', 'C-400', 'C-354', 'C-471', 'C-169', 'C-329', 'C-318', 'C-550', 'C-422', 'C-251', 'C-852', 'C-309'], 
 ['A-336', 'A-437', 'A-720', 'A-233', 'A-499'], 
 ['B-664', 'B-133', 'B-267', 'B-421', 'B-553', 'B-910', 'B-792', 'B-719', 'B-550', 'B-946'], 
 ['C-755', 'C-533', 'C-596', 'C-877', 'C-400', 'C-354', 'C-471', 'C-169', 'C-329', 'C-318', 'C-550', 'C-422', 'C-251', 'C-852', 'C-309'], 
 ['C-755', 'C-533', 'C-596', 'C-877', 'C-400', 'C-354', 'C-471', 'C-169', 'C-329', 'C-318', 'C-550', 'C-422', 'C-251', 'C-852', 'C-309'], 
 ['C-755', 'C-533', 'C-596', 'C-877', 'C-400', 'C-354', 'C-471', 'C-169', 'C-329', 'C-318', 'C-550', 'C-422', 'C-251', 'C-852', 'C-309'], 
 ['B-664', 'B-133', 'B-267', 'B-421', 'B-553', 'B-910', 'B-792', 'B-719', 'B-550', 'B-946'], 
 ['A-336', 'A-437', 'A-720', 'A-233', 'A-499']]

Instead, I should be getting something like, for example, the following:

thislist = ['A-336', 'B-553', 'C-596', 'B-910', 'C-251', 'C-329', 'B-910', 'A-437', 'B-946', 'C-251'] 

# Notice how there are more values with the "C" prefix from the larger secondary list,
# than values with the A or B prefixes from the smaller secondary lists.

You can get the lengths of all the lists and choose a list based on the lengths, for more detail [check here](https://stackoverflow.com/questions/1761626/weighted-random-numbers). You can use the cumulative sum of the lengths to get the weighted choice. — jkhadka, Oct 23 '19 at 19:26

jmm · Answer 1 · 2019-10-23T20:21:14.507

Weighted Choice

random.choices(population, weights, k) takes a list of weights for your random selection. Therefore, you could give it the length of the sublists as weights:

weights = [len(c) for c in codigo_cliente]

and let it select a sublist for you (you can also tell it to select a sublist 10 times with k=10). From each of these sublists you can then select an arbitrary list element:

thislist = [random.choice(c) for c in random.choices(codigo_cliente, weights=weights, k=10)]

You can also pull it together for a one-liner solution:

thislist = [random.choice(c) for c in random.choices(codigo_cliente, weights=[len(c) for c in codigo_cliente], k=10)
]

Reference: A weighted version of random.choice

Flattened List

If you can afford the additional storage, you can flatten the list and do the selection on the flattened list like this:

import random
import itertools

codigo_cliente = [['A-336', 'A-437', 'A-720', 'A-233', 'A-499'],
                  ['B-664', 'B-133', 'B-267', 'B-421', 'B-553', 'B-910', 'B-792', 'B-719', 'B-550', 'B-946'],
                  [
                      'C-755', 'C-533', 'C-596', 'C-877', 'C-400', 'C-354', 'C-471', 'C-169', 'C-329', 'C-318',
                      'C-550', 'C-422', 'C-251', 'C-852', 'C-309'
                  ]]
thislist = []
temp = list(itertools.chain.from_iterable(codigo_cliente))

for i in range(10):
    thislist.append(random.choice(temp))

print(thislist)

Different approaches to flatten nested lists can be found here: How to make a flat list out of list of lists?

Assuming the OP wants elements to come from each sublist with equal weighting (thereby overweighting elements of short sublists relative to long sublists), flattening breaks the intended weighting. — ShadowRanger, Oct 23 '19 at 19:37
@ShadowRanger the probability of choosing an A-value in the flattened list is lower than choosing a B- or C-value, because there are less A-values in the flattened list. — jmm, Oct 23 '19 at 19:48
Ah, okay. I misread the OP initially; they actually want an *unweighted* choice among all options, so the nested `list`s aren't helping. Your solution is a good one. I'd recommend using `random.choices` to make all 10 choices at once (the wrapping overhead involved in each call to `random.choice` gets amortized away that way), but otherwise, perfect. — ShadowRanger, Oct 23 '19 at 19:53

score 1 · Answer 2 · answered Oct 23 '19 at 19:35

1

You are not picking random item from those nested list, but complete nested list.

First get the random nested list and then choose item randomly

for i in range(10):
    rand_list = random.choice(codigo_cliente)
    thislist.append(random.choice(rand_list))

answered Oct 23 '19 at 19:35

Saleem Ali

1,363
11
21

Or just one-line it as: `thislist.append(random.choice(random.choice(codigo_cliente)))` :-) – ShadowRanger Oct 23 '19 at 19:36
of course. just explaining the item picking from nested lists wrote that in two line – Saleem Ali Oct 23 '19 at 19:39
Note: I up-voted, but on rereading the OP's question, they don't actually want it weighted by sub-`list`, they want an even chance of picking any element. This will pick elements from smaller sub-`list`s more than those from larger sub-`list`s. It's a good solution when you actually want even weighting between sub-`list`s, but not for the OP's particular case. – ShadowRanger Oct 23 '19 at 19:54
I tried this way, it works to the extent that I do get a list with random values, but, with equal probabilities, meaning that, if I have a secondary list with 10 items, a second secondary list with 50 items, and a third secondary list with, let's say, 300 items, and I then run the code to get a sample of 1000 random items, I get around 300 items from each secondary list in the new random list... there should be considerably more from the third secondary list than from the other two. – eurojourney Oct 23 '19 at 21:10

RootTwo · Answer 3 · 2019-10-23T19:43:24.873

Use random.choices() with the weights argument set to the lengths of the lists. This selects the lists in proportion to their length. Then use random.choice() to select an element from each list. k is the number of items to select:

from random import choice, choices

w = [len(d) for d in codigo_cliente]
[choice(lst) for lst in choices(codigo_cliente, weights=w, k=10)]

Sample output:

['C-400', 'C-596', 'B-553', 'C-471', 'B-133',
 'C-596', 'B-133', 'A-499', 'C-471', 'C-400']

Getting weighted random values from a list of lists with different list lengths

3 Answers3

Weighted Choice

Flattened List