0

What's an efficient way to generate N unique vectors of length M (each element a random variable drawn from its own arbitrary distribution pmfm) so that each vector satisfies two rules:

  1. Elements are unique
  2. Elements are integers bounded in the interval (0,M]

For context- I'm performing a Monte Carlo simulation relying on M competitors' rankings after a contest as input, but want to consider only realistic outcomes by modeling the likelihood of each one's placement based on a measure of their skill.

Edit: In this context, I suppose the RVs that compose each vector are not really independent, giving rise to the constraints. In that case, maybe I need to perform Gibbs sampling from an M-dimensional joint pmf. I would need to somehow define such a joint pmf to account for constraints. However, this introduces memory issues since M can be as large as 37.

Danny
  • 181
  • 1
  • 14
  • Does this answer your question? [Weighted random sample without replacement in python](https://stackoverflow.com/questions/43549515/weighted-random-sample-without-replacement-in-python) – Peter O. Feb 29 '20 at 04:41
  • See also: https://stackoverflow.com/questions/15113650/faster-weighted-sampling-without-replacement – Peter O. Feb 29 '20 at 04:42
  • Thanks Peter. It seems that approach might work if all the elements of each vector were drawn from the same weighted population. But in this case the position of the element in the vector should key the weighting of the population to sample from. I guess if there were a way to make the p argument of np.random.choice accept a generator which returns a different array for each sample that might work. – Danny Feb 29 '20 at 18:59
  • As I understand the question, each weight of each competitor follows a nonuniform distribution (e.g., Glicko2); is that correct? In that case, draw the weights for each competitor, then draw competitors without replacement according to those weights. – Peter O. Feb 29 '20 at 19:57
  • Let's say competitors 1,2,3 have skill 1800, 1500, 1200 respectively. Then for example, vector [3 1 2] would represent the lowest-skilled (1200) competitor having placed first, the best competitor (1800) placing second, and the middle competitor (1500) placing last. – Danny Feb 29 '20 at 20:09
  • Also I say arbitrary pmf because I'd like ability to factor in the _gaps_ between the skills, but a reasonable first assumption is that competitor's placement is normally distributed with a mean equal to where their skill ranks. I think you're suggesting that I could use static weights, with the skill measure used to weight each player. Then sampling without replacement should favor the highest skilled competitor being sampled sooner than the second, etc. – Danny Feb 29 '20 at 20:25

0 Answers0