3

I've seen algorithms for weighted sampling without replacement such as the Efraimidis & Spirakis algorithm explained in the second answer to this question: Faster weighted sampling without replacement. My problem is that I don't have access to weights, and rather have access to the probabilities of each element being in the sample.

For example, given that the Efraimidis & Spirakis algorithm is trying to choose 3 integers from the integers 1 to 6, and is given the weights [0.995, 0.001, 0.001, 0.001, 0.001, 0.001] for the respective 6 integers, the algorithm's samples contain each of the integers with the following probabilities:

1: 1.00000
2: 0.39996
3: 0.39969
4: 0.39973
5: 0.40180
6: 0.39882

(This data is taken from the same question discussed above).

The problem is that I only have access to those resulting probabilities. Is there any way to convert these probabilities to weights, or is there another algorithm or a modification to this algorithm that can perform sampling without replacement using probabilities rather than weights.

Edit

Based on the comments, I realize that I can't directly get weights from the probabilities as the precision of the probabilities is unknown. My context is that I have a neural network that is producing these probabilities with relatively decent precision. I don't believe I have a good method of getting the neural network to produce those weights. Would it be reasonable to just assume infinite precision in this context and still be able to produce samples that match those probabilities? If so, what would be an approach to sampling using those probabilities that have been assumed to be perfect?

Also, I see the problem with having a perfect 1.00 probability for a number as it requires that number's weight to be infinitely more likely than the other probabilities, but I believe I could just set the maximum probability of my network to something like 0.999 to avoid that issue. I'm willing to make approximations that still result in samples that match the probabilities given to a decent level of precision.

Bhaskar
  • 680
  • 7
  • 26
  • The problem you have is that you do not know from the distribution, the total number of samples you have. You could have 995 ones, 1 two, 1 three, ... or 995000 ones, 1000 twos... and that makes a difference when you are drawing with no replacement. Not sure this exercise makes sense. – Tarik Jul 26 '20 at 01:25
  • I'm not sure I understand. I know I'm choosing three integers from a set of six integers. What are you referring to when you say total number of samples? The number of samples used to determine the probabilities? – Bhaskar Jul 26 '20 at 01:35
  • *"Is there any way to convert these probabilities to weights"*. Not unless you know the probabilities with infinite precision. The weights in your example could be [0.90, 0.02, 0.02, 0.02, 0.02, 0.02] or [0.9995, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], and you could still reasonably expect those probability results. The probabilities tell you that 1 is much more likely than any other number, and all the other numbers are equally likely. But because the probability of 1 is 1.00000, you have no information about how much more likely 1 is than the other numbers. – user3386109 Jul 26 '20 at 01:40
  • I see, that makes sense. Is there any way I could produce samples if I did assume infinite precision. For example, if I supplied probabilities [1, 0.4, 0.4, 0.4, 0.4, 0.4] for the distribution formed when samples are produced by always containing 1 and then containing 2 random integers between 2 and 6, I know that the probabilities have infinite precision. Is there any approach to sampling in that scenario? – Bhaskar Jul 26 '20 at 01:58
  • Does [this answer](https://stackoverflow.com/a/31029620/2144669) help? – David Eisenstat Jul 26 '20 at 02:10
  • Currently looking into it – Bhaskar Jul 26 '20 at 02:17
  • Are you only considering fixed-size sampling? If not, you can use Poisson sampling. https://en.wikipedia.org/wiki/Poisson_sampling . If you really need fixed-size sampling, then all the probabilities should sum up to an integer. – Yu-Han Lyu Jul 26 '20 at 02:21
  • Yeah, it has to be fixed-size. And I believe the question above does answer my question. Thanks for finding it for me! – Bhaskar Jul 26 '20 at 02:22
  • If you want fixed-size, then a lot of (Probability Proportional to Size)pps sampling can work. One simple method is called systematic sampling: random shuffle all elements and put them on an interval [0, k]. You uniformly at random pick one starting point X in [0, 1]. Then, pick k elements on the interval containing X, X+1, X+2, ...., X-1+k. – Yu-Han Lyu Jul 26 '20 at 02:34
  • @Yu-HanLyu it's a nice approach but doesn't yield a maximum entropy distribution in general. – David Eisenstat Jul 26 '20 at 13:12
  • Any thoughts on this question? https://stackoverflow.com/questions/63105959/sampling-without-replacement-with-unequal-dependent-probabilities – Bhaskar Jul 26 '20 at 21:37
  • @DavidEisenstat If "maximum entropy" is required, then Conditional Poisson sampling might be the way to go. – Yu-Han Lyu Jul 27 '20 at 21:18

0 Answers0