The number of elements in my set are over a billion 230. I intent to count the occurrence of each element in the set. For this purpose, I want to use count-min sketch. Please suggest how the hash functions should be chosen. The false positive rate of at most 5% is tolerable for my application.
Asked
Active
Viewed 624 times
0

templatetypedef
- 362,284
- 104
- 897
- 1,065

Curious
- 65
- 10
-
What is the value range? Do you have any information about the distribution of the values? – Amnon Shochot Apr 02 '15 at 21:17
-
The range set is collection of strings composed of 4 letters. In other words the universe is set of quaternary strings of length at most 15 – Curious Apr 03 '15 at 04:41
1 Answers
0
Count-Min Sketch requires 2-wise independent hash functions, but in practice, I strongly recommend MurmurHash. It is fast and robust, works perfectly well for Count-Min Sketch.

xmerge
- 126
- 5