-1

I am programming a hardware packet processor, and I am trying to sample packets. My goal is to keep one out of 10 packets. However, I am aware that one cannot just keep every 10th packet as that would not be a correct sampling method.

I use a random number generator, although the number is always between 0 and (2^n) - 1. That is, if the random number is 4 bits in size, the random number generator will generate a number between 0 and 15.

My first approach was to generate a number of size 10 or 11 bits. Say a 10-bit number n would go between 0 and 1023. I was planning to modulo 10 the n random number and pick one out of the 10 numbers. Unfortunately, this is not possible as my hardware packet processor does not contemplate modulo operations without knowing n at compile time.

My second option was to make a simple if:

// between 0 and 1023. random() retrieves a random number with uniform distribution
int<10> n = random();

if n < 102{
  keep_packet()
} else{
  drop_packet()
}

I wonder if this last method is indeed a correct sampling method and as correct as doing n modulo 10.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
ederollora
  • 1,165
  • 2
  • 11
  • 29
  • Your question does not appear to be _about_ any of the tags you used except [tag:sampling]. Beyond that, without knowing exactly what language you're using and what library/framework implements the `random()` function it would not be possible to say for sure whether the code you have is correct. I will note that even with the assumptions you state above, your method is not _exactly_ correct, because the 102 possible values is not _exactly_ 10% of the total range. It may well be close enough for your purposes though, depending on what your _exact_ goal is. – Peter Duniho May 10 '21 at 21:17
  • @PeterDuniho the tags I added were related to SDN, because the device is used in SDN networks and the networking tag seems to encompass the topic in my opinion. Still fair enough if you think they are excessive. You are right, 102 is not exactly 10 percent but also mod 10 is not exactly 10 percent either. The language is P4 but I imagine most people don't probably know about it, that's why I omitted it. The function creates a (pseudo) random number of *k* bit wide, with uniform distribution. My question mostly focused on asking if " *n* mod 10" and n <124 is the same and if both are correct. – ederollora May 10 '21 at 22:48
  • @PeterO. Thank you for your comment. I think we ask a different concept as far as I could read. – ederollora May 10 '21 at 22:48
  • Code review is typically outside of scope here. A Stack Overflow question is expected to be about a specific, clearly-posed, well-specified, narrow, specific issue. If you don't know if there _are_ any issues, this is not the place. – Charles Duffy May 11 '21 at 01:53

1 Answers1

2

It depends on what you mean by "correct".

Assuming n = random() outputs independent uniform random integers in the interval [0, 1023), then checking whether n < 102 will succeed 102/1024 of the time. 102 out of 1024 values will be accepted and the rest rejected. Does it correctly succeed 10% of the time? No. Does it correctly succeed 102/1024 of the time? Yes.

Likewise with the same assumption, checking whether n % 10 == 9 will succeed 102/1024 of the time. 102 out of 1024 values will be accepted (namely those that end in nine) and the rest rejected. Does it correctly succeed 10% of the time? No. Does it correctly succeed 102/1024 of the time? Yes.

There is a third way to proceed: rejection sampling. Keep generating n = random() until n < 1020, then check whether n < 102. This will correctly succeed 10% of the time. The tradeoff, though, is that there is a 4/1024 chance of requiring more than one random number, a (4/1024)^2 chance of requiring more than two, and so on. In fact, rejection sampling will run forever in the worst case (as is the case in general), and stopping the method after K rejections will introduce bias. However, you can make the bias as small as you like by choosing K appropriately (it may or may not be acceptable for you for a hardware device to "freeze").

See also:

Peter O.
  • 32,158
  • 14
  • 82
  • 96