0

I studied count min sketch (CMS) by reading a paper: http://dimacs.rutgers.edu/~graham/pubs/papers/cmsoft.pdf

However i have no experience on reading algorithm so there are some confusion. Can someone help to answer some question about count min sketch.

  • Suppose depth is d, with is w, so how many elements I can add to CMS most so that the does not affect to error rate
  • I want to make test for CMS by adding a lot of elements to CMS then query all key put to CMS. How to calculate error and evaluate the error?
  • If I use CMS for a webservice as a heavy hitter and my web service run day by day, one day, the counter will overflow. So should I create then delete CMS every day??

Thanks!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Billy
  • 1
  • 1
  • 1
    In your first question, it isn't clear what you mean by "doesn't affect the error rate", but studying the calculations on page 4 might help you to answer your problem. For the second question, you would need to store all elements as-is, so that you can calculate the true answer, or generate the elements in such a way that you already know the true answer before you ask the CMS. Both your questions need to be more precise if you expect us to give a precise answer, though. –  Jun 23 '16 at 08:38
  • @Rhymoid, thank for your comment. I read page 4 many times however I can not get the answer myself. I will ask in another way: suppose I have d and w, then I add n elements to CMS, when n > w there will be collision cause the overestimate counter. The frequency_get_from_CMS > frequency_expected, the more elements added to CMS the overestimate is increase so I want to know the most elements I can add to CMS so that the overestimate is acceptable, and how to calculate it – Billy Jun 24 '16 at 10:49

0 Answers0