1

I have this problem in calculating Jaccard Similarity for Sets (Bit-Vectors):

v1 = 10111

v2 = 10011

Size of intersection = 3; (How could we find it out?)

Size of union = 4, (How could we find it out?)

Jaccard similarity = (intersection/union) = 3/4

But I don't understand how could we find out the "intersection" and "union" of the two vectors.

Please help me.

AML
  • 325
  • 1
  • 8
  • 16

1 Answers1

4

Presumably your definitions of "intersection" and "union" are "number of bits set in both values" and "number of bits set in either value".... which is (assuming you're using something like int or long for the vectors):

int intersection = CountBits(v1 & v2);
int union = CountBits(v1 | v2);

Next you just need to implement CountBits. This Stack Overflow question can help you there.

Instead of using int or long, you may want to use BitArray. That has And and Or methods, which look like they don't mutate the original values, but it's not entirely clear. You'd need to work out the best way of counting the bits set in a BitArray of course - just array.Cast<bool>().Count(bit => bit) may well work.

Community
  • 1
  • 1
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I've saved bits in an array , How do I convert it to a vector – AML Aug 24 '12 at 06:27
  • 1
    @AML: Well you don't need to - you can just count the bits which are present in both arrays, or present in either. Without any more information about how you're representing anything, it's very hard to help you further. Please read http://tinyurl.com/so-hints – Jon Skeet Aug 24 '12 at 06:32