3

I am currently using hyperloglog to estimate the cardinality of sets (# of unique items)

Its quite trivial to calculate the cardinality for the union of 2 sets and the cardinality for the intersection of 2 sets (|A intersect B| = |A| + |B| - |A union B|)

But I couldn't quite find a way to chain the operators of union and intersection together (note: the method of only allow calculating the cardinality not hyperloglog of intersection, that is, it is possible to get a new hyperloglog via A union B but not A intersect B).

Are there alternative algorithms that can estimate the cardinality of the result of chained union and intersection?

Jal
  • 2,174
  • 1
  • 18
  • 37

1 Answers1

1

Dogged use of Boolean algebra will do the trick, though you may not be happy with the quality if selectivity is low.

|((A n B) u C) n D| =
|(A n B) u C| + |D| - |(A n B) u C u D| =
|(A u C) n (B u C)| + |D| - |(A u C u D) n (B u C u D)| =
|A u C| + |B u C| - |A u B u C| + |D| - |A u C u D| - |B u C u D| + |A u B u C u D|

You should probably double check my math.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120