Randomly select two sets ,both set contains distinct keys (one key may belongs to multiple sets ,one set can never contain duplicate keys ).
Return a integer which represents for the number of keys belongs to both sets .
For example intersect({1,2,3,4},{3,4,5}) returns 2 .
I just need the size of the intersection .I don't need to know exactly which keys are there in the intersection .
Are there any datastructures support this kind of operations in less than O(n) time ?
Edit:
Reading the data out does requires O(n) time, but that would not lead to the conclusion that you can't do the intersection operation in less than O(n) time .
Image this scenario:
I have N sets,each contains 100 keys . I read them up, that's N*100 operations .Now I want to know witch pair of sets have the largest intersection ,that's O(N²) intersection operations .So I want to reduce the complexity of the intersection operation .I don't really care how much time it takes to read and construct the sets ,it's at most N*100,that's nothing compares to O(N²) intersection operations .
Be aware ,there's no way that you can find the pair of sets that have the largest intersection by doing less than O(N²) intersection operations ,I can prove that .You must do all the intersection operations .
(he basic idea is ,let's imagine a complete graph ,with N vertices ,each represent for a set ,and Nx(N-1)/2 edges, each represents for a intersection for the connected pair .Now give each edge an non-negetive weight all you want(represents for the intersection size) ,I can always construct N sets satisfy those Nx(N-1)/2 edge weights .That proves my claim .)