I want to find intersection of sets containing integer values? What is the most efficient way to do it if say you have 4-5 lists with 2k-4k integers?
-
In python you can use a `set`. – embert Feb 12 '14 at 07:05
-
1possible duplicate of [Computing set intersection in linear time?](http://stackoverflow.com/questions/4642172/computing-set-intersection-in-linear-time) – Bernhard Barker Feb 12 '14 at 07:26
2 Answers
In many languages like for example c++ sets are implemented as balanced binary trees so you can directly evaluate set intersection in O(NlogM)
use n as smaller set size by just looking up into the other set in O(logM)
.
Optimization :-
As you want it for multiple sets you can do the optimization that is used in huffman coding
:-
- Use a priority queue of sets which selects smallest set first
- select two smallest sets first evaluate intersection and add it to queue.
- Do this till you get empty intersection set or one set(intersection set) remaining.
Note: Use std::set if using c++

- 6,106
- 3
- 20
- 19
If you have memory to spare:
- Create a set that will hold the number of occurences of each value.
- For each integer I in each of your set, increment the number of occurences of I
- Extract integers with a number of occurences equal to the number of sets
This is theoretically in O(sum of all sets cardinalities + retrieval)
where retrieveal
can be either the range of your integers (if you're using a raw array) or the cardinality of the union of your sets (if you're using a hash table to enumerate the values for which an occurence is defined).
If the bounds of your set are known and small, you can implement it with a simple array of integers big enough to hold the max number of sets (typically a 8 bits char for 256 sets).
Otherwise you'll need some kind of hash table, which should still theoretically be in o(n).

- 8,479
- 1
- 19
- 43