1

I'm trying to build a system where I have a list of Items with weights associated with them. I want to randomly pull items from this pool, using their weights to determine chance of selection.

In the example spreadsheet, I have 10 fruits. I want to pull 3 fruits from the pool. The same fruit cannot be pulled more than once. These 3 fruits would be a fruit-selection.

I want to do this many times, across multiple rows, each providing a new random fruit-selection.

  1. I've come across a couple of issues:
  • Occasionally I've got this error:

    [Error
    Function INDEX parameter 2 value is 8. Valid values are between 0 and 7 inclusive.]
    
  • Sometimes the formula for pulling an item results in an empty cell.

  1. I'm probably not also doing this the most efficient way. I may need up to 300 rows of these 'fruit-selections'
  2. Finally, I would like to determine the number of fruits being drawn to make up each fruit-selection. I've not hooked this up yet as I was trying to get the above working.

I'm hoping there's an efficient way of doing this. Maybe using sequence?

Appreciate any help / advice on this one.

Spreadsheet is here

Peter O.
  • 32,158
  • 14
  • 82
  • 96
  • Are you ok with all results re-randomizing every time you refresh the sheet? If you want to avoid that, I think it would be convenient to use [Apps Script](https://developers.google.com/apps-script/overview) instead. Would you be ok with that? – Iamblichus Dec 10 '21 at 11:46
  • I don't mind if the results randomize on every change – Troy Williams Dec 10 '21 at 12:51

1 Answers1

0

not sure if you realize it but this is contra-productive... you can either have weights or unique values - not both of them because after you assign weights to your items it means that some of them will appear more frequently than the other ones. example:

let's say you want to choose 2 random items from this list

weight     item
_______________
4          mango
2          papaya
3          jackfruit

so you actually want to choose from this probability:

mango
mango
mango
mango
papaya
papaya
jackfruit
jackfruit
jackfruit

now as soon as you introduce "no duplicates" rule you are limiting your choice only to unique items so your choice will be reduced to:

mango
papaya
jackfruit

and at this point, you just lost the whole idea with weights which are just pointless because choosing two random items from the above list will yeild only 3 possible outcomes so the probability (weight) of choosing each item is 1:1:1 eg. the "weight" of 4 mangos becomes 1 mango

mango
papaya

papaya
jackfruit

mango
jackfruit

that being said... the selection of weighted random items would be like this:

=INDEX(QUERY(SORT(FLATTEN(SPLIT(QUERY(
 REPT(C2:C11&",", A2:A11),,9^9), ", ", 1)), 
 RANDARRAY(SUM(A2:A11)), ), "limit "&E2))

enter image description here

enter image description here

and selection of random non-repeating items would be done like this:

=INDEX(QUERY(SORT(FILTER(C2:C11, C2:C11<>""), 
 RANDARRAY(COUNTA(C2:C11)), 1), "limit "&E2))

enter image description here


update:

paste in H2:

=INDEX(QUERY(SORT(FLATTEN(SPLIT(QUERY(
 REPT(C2:C11&",", A2:A11),,9^9), ", ", 1)), 
 RANDARRAY(SUM(A2:A11)), ), "limit 1"))

paste in H3:

=INDEX(QUERY(SORT(FLATTEN(SPLIT(QUERY(
 REPT(C2:C11",", A2:A11),,9^9), ", ", 1)), 
 RANDARRAY(SUM(A2:A11)), ), "where Col1 <> '"&H2&"' limit 1"))

paste in H4 and drag down:

=IFNA(INDEX(QUERY(SORT(FLATTEN(SPLIT(QUERY(
 REPT($C$2:$C$11&",", $A$2:$A$11),,9^9), ", ", 1)), 
 RANDARRAY(SUM($A$2:$A$11)), ), "where not Col1 matches '"&
 JOIN("|", H$2:H3)&"' limit 1")))

and then use:

=OFFSET(H2:H,,,E2)

enter image description here

demo spreadesheet

player0
  • 124,011
  • 12
  • 67
  • 124
  • 1
    Wow, thanks for this. The explanation is super clear and really helpful. I still think I need a bit of both, so maybe I didn't explain properly. Example: When selecting 3 items: - Apple has the highest weight, let's say that item is drawn. - Now remove Apple as an option when drawing the second item. The next highest weight item is (tied: Banana, Cherries, Graps). Most likely one of them will get drawn. - Item 2 is now removed from the list of possible items to draw for item 3. Resulting fruit-selection (with highest probably outcome) is Apple, Banana, Cherries for example – Troy Williams Dec 10 '21 at 12:55
  • 1
    You should (in theory) be able to get a weighted random draw without replacement. It's just that the normal way to get a random draw without dups is by sorting, but weighting doesn't work with that method. The only way I can think of at the moment is to make a selection using weighting, then filter the list to exclude the value(s) already drawn - but that would have to be a pull-across formula, not a single array formula. – Tom Sharpe Dec 10 '21 at 15:16
  • Thanks. Yeah that's what I attempted at first. Might have to go back to that method. Was a bit laboursome but maybe that's the only way. Thanks – Troy Williams Dec 10 '21 at 17:23
  • @TroyWilliams answer updated – player0 Dec 10 '21 at 18:57
  • I thought naively maybe you could divide the random number by the weights to get a weighted random number to sort on, but I think it isn't that simple. I'm trying to get my head round this https://stackoverflow.com/questions/4463561/weighted-random-selection-from-array which sounds promising. – Tom Sharpe Dec 10 '21 at 20:29
  • This does not seem to be working for fruit names with spaces. Any thoughts? It seems to work then if we change the second ", " in the formula with ",", but then it doesn't work with 0 weights (for the last fruit?). – The Coding Wombat Feb 08 '23 at 21:15
  • @TheCodingWombat can you share a copy/sample of your sheet with example of desired result? – player0 Feb 08 '23 at 21:51