0

I understand the basic guarantees of a bloom filter data structure.

Guarantees:

  1. A Negative result is guaranteed to be true.
  2. A Positive result has a probability, known as 'false positive probability', that it is a false Positive.

Let's take the usecase of checking if a user has 'liked' an article or not. We can do this by checking if the concatenation of user.user_id and article.article_id is in the bloom filter.

The context of my question is: the vast majority of such pair of (user_id, article_id) will never be in the bloom filter, i.e. the base probability of any uniformly-random pair that is in the bloom filter is very small.

Using the bloom filter, we increase the probability of detecting a negative (true negative) from a near-1 probability to a definite 1. That is to say, for most of the time, we are computing if a random pair is NOT in the bloom filter for naught because most of the time, it wouldn't be.

My questions are: where exactly is the performance gain in using a bloom filter and perhaps, more generally, what are the criterions in a particular circumstance that would warrant the use of the bloom filter?

Jim
  • 450
  • 2
  • 10
  • 1
    Does this answer your question? [What is the advantage to using Bloom filters?](https://stackoverflow.com/questions/4282375/what-is-the-advantage-to-using-bloom-filters) – Jason C Feb 25 '23 at 10:00

0 Answers0