0

The number of elements in my set are over a billion 230. I intent to count the occurrence of each element in the set. For this purpose, I want to use count-min sketch. Please suggest how the hash functions should be chosen. The false positive rate of at most 5% is tolerable for my application.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Curious
  • 65
  • 10
  • What is the value range? Do you have any information about the distribution of the values? – Amnon Shochot Apr 02 '15 at 21:17
  • The range set is collection of strings composed of 4 letters. In other words the universe is set of quaternary strings of length at most 15 – Curious Apr 03 '15 at 04:41

1 Answers1

0

Count-Min Sketch requires 2-wise independent hash functions, but in practice, I strongly recommend MurmurHash. It is fast and robust, works perfectly well for Count-Min Sketch.

xmerge
  • 126
  • 5