Computing unique intersections of subsets

Question

Given a set S = {s_i : {z_j : z ∈ N} }, what is a time-efficient algorithm for computing the unique sets of intersections of the subsets of S?

For background, I am dealing with several versions of this problem, some larger than others. In the smallest one |S| ≈ 1,000, |s_i| ≈ 10,000 and the values are zip codes.

Tiny example for clarity:

Input: S = {{},{1},{2,3},{3,4,5,6,7,8,9,10}}
Output: {{},{1},{2,3},{3,4,5,6,7,8,9,10},{3}}

|S| = 4 and there are 2⁴ = 16 subsets of S. However, there are only five unique sets of subset intersections. The first four are the members of S themselves. The fifth is {3}. The empty set is already a member of S. All other 10 subset intersections produce empty sets also.

I think what you are looking for is just a simple set intersection algorithm, take a look at [linear intersection algorithm] (http://stackoverflow.com/questions/4642172/computing-set-intersection-in-linear-time) — amas, Apr 04 '13 at 06:23
@amas there are 2^n subsets of S and S has more than 1000 elements. An O(n) intersection algorithm doesn't help. Intersection is not the issue IMO, it's deciding which intersections not to do. — Sim, Apr 04 '13 at 06:25
sorry 4 that, maybe I didnt understand the problem. can you give more details? — amas, Apr 04 '13 at 06:27
@amas I am not sure what other details you require. If you take all 2^|S| subsets of S and intersect them there will be very few unique sets in the 2^|S| results. I am looking for those unique sets. — Sim, Apr 04 '13 at 06:29
@HighPerformanceMark my question asked for the unique intersections (plural) of subsets but I hear you and have added an example. — Sim, Apr 04 '13 at 06:39
@KhaledAKhunaifer I have edited my example to clearly indicate input and output. — Sim, Apr 04 '13 at 07:54
How do you define `subset` and `intersection`? For example, `S1 = {{1}, {2,3}}` and `S2 = {{}, {1}, {2,3}}` are subsets of `S` and their intersection is `(S1 & S2) == {{1}, {2,3}}` i.e., members of an intersection are sets themselves therefore a set of intersections is a set of sets (intersections) of sets (members of `S`). How do you get the output to be just a set of sets? — jfs, Apr 04 '13 at 11:12
@Sim Without addressing J.F.Sebastian's question, your question seems erroneous (as he points out). The intersection of subsets of S must be a set of elements of S, and {3} is no element of S. — G. Bach, Apr 04 '13 at 13:51
Since the worst case is so bad, it would help to know more about your data. — David Eisenstat, Apr 04 '13 at 16:39
@J.F.Sebastian The intersection operation is on the flattened values. I will find a way to explain this better. — Sim, Apr 04 '13 at 21:40
@J.F.Sebastian Call a set of numbers an n-set. Given a set of n-sets S, determine the set of n-sets that can be expressed as the intersection of one or more n-sets belonging to S. — David Eisenstat, Apr 04 '13 at 22:43

David Eisenstat · Accepted Answer · 2013-04-04T13:36:43.823

Here's a fast preprocessing step that might be worthwhile.

Call elements x and y equivalent if, for every set s_i, either both or neither of x and y belong to s_i. Simplify the problem by deleting all elements except one representative of each equivalence class. At the end, expand each representative to its class.

To simplify, scan sets one by one while maintaining a map from each element to a label for its equivalence class, where equivalence is determined with respect to the sets processed so far. Initially, all elements are in the same class, so this map sends each element to the same label. To process a set, initialize an empty map of new labels. For each element x in the set, let k be x's current label. If the key k exists in the new label map, then the value corresponding to k becomes x's new label. Otherwise, we allocate a new label and assign it to x and add a mapping from k to the new label.

Here's the execution on your input.

Initially label = {1: a, 2: a, 3: a, 4: a, 5: a, 6: a, 7: a, 8: a, 9: a, 10: a}.
Scan {}. Nothing happens.
Scan {1}. Change label[1] to b.
Scan {2, 3}. Change label[2] and label[3] to c.
Scan {3, 4, 5, 6, 7, 8, 9, 10}. Change label[3] to d and label[4..10] to e.
At the end, label = {1: b, 2: c, 3: d, 4: e, 5: e, 6: e, 7: e, 8: e, 9: e, 10: e}. Select 1 (b) and 2 (c) and 3 (d) and 4 (e) to represent their classes. All others are deleted.

Interesting. It doesn't solve the problem but it gave me a good idea for a way to approach the entire problem in a different way and so I'll award the answer. — Sim, Apr 05 '13 at 03:15

score 0 · Answer 2 · answered Apr 04 '13 at 08:34

Asymptotic Time Complexity:

n: number of sets, which will change during execution

m: average set size

Time: T(Search-Matching-Sets) x T(Intersection) x SUM { k } for k: 1..n

Time: 2m x 2m x 1/2 n(n+1)

Time: O(4 m^2 x 1/2 n x (n+1)) ~ O(n^2 m^2)

Proposed Algorithm:

FOR i: 1..n
{

    FOR j: i..n
    {

        IF i = j THEN SKIP

        INTERSECTION := FIND-INTERSECTION ( SETS[i], SETS[j] )

        IF INTERSECTION ⊄ SETS[] THEN ADD INTERSECTION TO SETS[]

    }

}

Computing unique intersections of subsets

2 Answers2

Linked