Sampling without replacement with unequal, dependent probabilities

Question

So from answers to the question, Randomly Generating Combinations From Variable Weights, I've managed to sample without replacement given unequal probabilities. I've implemented this in my particular context and it works great.

Now I have access to additional information about my distribution. Specifically, I've been using a neural network to generate these probabilities so far, and I've now trained my neural network to output a probability for each pair of unique objects, in addition to probabilities for individual objects.

So if the population is [A, B, C, D, E, F] and I'll be sampling groups of size 3, then I have access to the probability that the resulting sample will contain both A and B, or the probability that the resulting sample will contain both C and E, etc. I also still have access to the probabilities that the sample contains any of the individual elements.

The question above has made me realize that sampling is a much harder problem than I had initially expected it to be. Still, I'm wondering if there's any way this information about the probability of these pairs could be used to make my sample better match my actual target distribution.

Using the method discussed in the first answer of the question above, where gradient descent is used to find probabilities that result in the correct conditional probabilities, when supplied the following probabilities:

[0.349003, 0.56702, 0.384279, 0.627456, 0.585684, 0.486558] (for the respective elements of the population from above)

a sample of 3,000,000 groups follows the following distribution, where intersections of the matrix represent the probability that a sampled group contains both the elements that are intersected:

   A          B          C          D          E          F
A [0.34860067 0.15068367 0.09183067 0.17385267 0.15775267 0.12308167]
B [0.15068367 0.567596   0.168863   0.309292   0.28285467 0.22349867]
C [0.09183067 0.168863   0.384101   0.19396467 0.175895   0.13764867]
D [0.17385267 0.309292   0.19396467 0.62724833 0.32153967 0.25584767]
E [0.15775267 0.28285467 0.175895   0.32153967 0.58571833 0.23339467]
F [0.12308167 0.22349867 0.13764867 0.25584767 0.23339467 0.48673567]

I want to, for example, be able to provide my sampling algorithm with the following matrix:

   A        B        C        D        E        F
A [0.349003 0.148131 0.15166  0.098827 0.151485 0.147903]
B [0.148131 0.56702  0.140829 0.327032 0.331278 0.18677 ]
C [0.15166  0.140829 0.384279 0.190329 0.09551  0.19023 ]
D [0.098827 0.327032 0.190329 0.627456 0.391803 0.246921]
E [0.151485 0.331278 0.09551  0.391803 0.585684 0.201292]
F [0.147903 0.18677  0.19023  0.246921 0.201292 0.486558]

and have its sample's distribution closer to matching this supplied matrix than if only individual probabilities were used as was done above.

I'm okay with the sampling algorithm not perfectly matching the probabilities, though it would be nice if that was possible. Based on my own attempts at this problem, I realize that things are more difficult because though P(A and B) is supplied, it is impossible to derive P(A and B and C) from just the information in the matrix. Basically my question is, is there any way for me to use this additional information I have (probabilities for pairs) in order to produce samples with distributions that better model my intended distribution than if I used only the probabilities for each individual object.

My population is a deck of cards, so 52, and I'm choosing 10 cards — Bhaskar, Jul 26 '20 at 21:48
The context is that I have a neural network, which for a card game, takes data about the game state and uses it to make predictions about the opponent's hand. As of now, I have a neural network which can output a probability for each individual card, as well as each pair, and I want to use these probabilities to sample possible opponent hands. — Bhaskar, Jul 26 '20 at 21:55
Makes sense. If you had a tiny population, brute forcing the distribution with a solver would be an option, but here it's tricky since 52 choose 10 times 4 bytes per float is already almost 60 GiB just to store the output. — David Eisenstat, Jul 26 '20 at 22:02
I can't provide an answer, but you can check this slide and references inside it. https://www.researchgate.net/profile/Lennart_Bondesson/publication/260138813_On_Sampling_with_Prescribed_Second-order_Inclusion_Probabilities/links/5a69ddd30f7e9b01f3ee65ff/On-Sampling-with-Prescribed-Second-order-Inclusion-Probabilities.pdf — Yu-Han Lyu, Jul 27 '20 at 21:28
The relevant reference is https://www.jstor.org/stable/23357227 . It generalizes the approach from the literature that my answer on the other question implemented without citing, but everything gets significantly more complicated. I do wonder if it's possible to get by with just the rejection sampling algorithm and stochastic gradient descent, but this is not really my very specific area of expertise. — David Eisenstat, Jul 27 '20 at 23:30
Yeah, and I was looking at the paper, and it explains that sometimes the approximation methods they use don't behave properly and diverge or get stuck, requiring some tinkering to be fixed. For my use case, I expect to be sampling about a hundred times from a few thousand different unique distributions under a time constraint of 30 seconds, so I believe that issue, along with how calculation heavy this sampling algorithm is, it likely won't be reasonable to use for me. — Bhaskar, Jul 28 '20 at 00:33
I'm going to hold off on asking on the stats community for now, as I'm starting to realize that even if I find a reasonable algorithm for this, it'll likely surpass my time limit for my context. I'm going to commit to using just the probabilities for individual objects for now. I am curious about this question though, so I'll likely ask it on there at some point in the future. — Bhaskar, Jul 28 '20 at 00:35

Sampling without replacement with unequal, dependent probabilities

0 Answers0

Linked