6

The width (number of buckets) and depth (number of hash functions) of a Count-Min sketch determine the accuracy of the frequency estimation retrieved.

From the 2005 paper of the original Count-Min authors:

The parameters w and d can be chosen by setting w=⌈e/ε⌉ and d=⌈ln1/δ⌉, where the error in answering a query is within a factor of ε with probability δ.

From this as described above:

w=⌈e/error⌉

d=⌈ln(1/(1−certainty))⌉

From the 2011 paper by the original Count-Min authors:

Suppose we want an error of at most 0.1 (of the sum of all frequencies), with 99.9 certainty. Then we want 2/w=1/1000, we set w=2000, and (1/2)^d=0.001, i.e. d=log0.001/log0.5 ≤ 10.

resulting in:

w=⌈2/error⌉

d=⌈ln(1−certainty)/ln(1/2)⌉

Yet the error has to be dependent of the total number of elements N that are stored into the sketch. The more the elements the greater the error and probability of error. What would be an appropriate function in order to create the initial sketch?

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065

0 Answers0