Efficient list intersection algorithm

Question

Given two lists (not necessarily sorted), what is the most efficient non-recursive algorithm to find the set intersection of those lists?
I don't believe I have access to hashing algorithms.

Actually no. I'm at work and I have to program in a statistical modeling environment called eviews. Eviews does not have set intersection built in, and also does not support recursion. I need a quick algorithm because my sets tend to be large and the program needs to be run frequently. Thanks! — , Jan 30 '09 at 21:40
What language are you working in? Maybe your language already provides something that would make this task easier. — Juliet, Jan 30 '09 at 22:11
Are the valores in each list unique? If yes, you could join the lists, sort the result, and look for duplicates. — Fabio Ceconello, Jan 30 '09 at 22:35
How many elements in the sets typically? (e.g. is it worth your time to try to implement a hash, or can you get away with sorting = O(n log n) ?) — Jason S, Jan 30 '09 at 23:41
If the program needs to be run frequently - can you build up a set of intersections as the points are initially added - i.e. never do it more than once and keep your intersections always up to date. — Fortyrunner, Jan 31 '09 at 23:48
What is the data type you are sorting? Sometimes there are characterstics of the data that you can take advantage of in designing an algorithm. — AShelly, Feb 02 '09 at 18:19

score 47 · Answer 1 · answered Jan 30 '09 at 21:39

47

You could put all elements of the first list into a hash set. Then, iterate the second one and, for each of its elements, check the hash to see if it exists in the first list. If so, output it as an element of the intersection.

answered Jan 30 '09 at 21:39

Frank

64,140
93
237
324

This sounds good, but I don't believe I have access to hashing algorithms either. Any suggestions? – Jan 30 '09 at 21:42
6

Then, maybe: * sort list1 (time: n log n) * sort list2 (time: n log n) * merge the two and check for similar entries as you iterate the two sorted lists simultaneously (linear time) – Frank Jan 30 '09 at 21:49
3

I don't have enough points to comment on other threads, but regarding the point that quick sort is recursive: You can implement it without recursion. See here, for example: http://www.codeguru.com/forum/archive/index.php/t-333288.html – Frank Jan 30 '09 at 21:56
Thanks, added the details to my answer – Wookai Jan 30 '09 at 22:00
4

If you have access to arrays, surely you could build your own hash table. Building a reasonable hash function is usually quite simple. – Keith Irwin Jan 31 '11 at 06:58
1

And how do you do it for multiple lists? Say, you have multiple lists and you want a intersection over all of them? In my understanding, it will still go this way: create hash for the first, start iterating over the rest of the lists, and check each of their elements exist in the hash..right? – khan Jul 23 '14 at 17:43
What's the time complexity of this answer? – theonlygusti Oct 18 '20 at 10:35

score 27 · Answer 2 · answered May 23 '10 at 16:22

You might want to take a look at Bloom filters. They are bit vectors that give a probabilistic answer whether an element is a member of a set. Set intersection can be implemented with a simple bitwise AND operation. If you have a large number of null intersections, the Bloom filter can help you eliminate those quickly. You'll still have to resort to one of the other algorithms mentioned here to compute the actual intersection, however. http://en.wikipedia.org/wiki/Bloom_filter

This is a fascinating approach for efficiently determining whether two large sets overlap. — Rick Sladkey, Jun 18 '14 at 01:29

score 10 · Answer 3 · answered Jan 30 '09 at 21:49

10

without hashing, I suppose you have two options:

The naive way is going to be compare each element to every other element. O(n^2)
Another way would be to sort the lists first, then iterate over them: O(n lg n) * 2 + 2 * O(n)

answered Jan 30 '09 at 21:49

Tom Ritter

99,986
30
138
174

And one other: if it's possible to add a property to each element, reset it to zero for all elements in both sets first, then set it to 1 in one of the sets, then scan through the second set finding elements with the property set to 1. This is `O(n + m)` but not always possible. – Roman Starkov May 27 '13 at 17:15
Maybe it can be improved with an O(log n) binary search? – knight Sep 20 '15 at 13:43
6

Just a note that `O(n lg n) * 2 + O(n) * 2` is the same as `O(n lg n)`. – porglezomp Jul 19 '16 at 16:18
First of all sorting a linked list isn't nlogn because you don't have O(1) access, you need to move it into an array. Secondly you need to sort just one list and then do a binary search on it with each element of the first list. – shinzou Nov 20 '17 at 17:29

score 7 · Answer 4 · answered May 23 '10 at 16:56

From the eviews features list it seems that it supports complex merges and joins (if this is 'join' as in DB terminology, it will compute an intersection). Now dig through your documentation :-)

Additionally, eviews has their own user forum - why not ask there_

score 6 · Answer 5 · edited Feb 13 '16 at 07:07

6

in C++ the following can be tried using STL map

vector<int> set_intersection(vector<int> s1, vector<int> s2){

    vector<int> ret;
    map<int, bool> store;
    for(int i=0; i < s1.size(); i++){

        store[s1[i]] = true;
    }
    for(int i=0; i < s2.size(); i++){

        if(store[s2[i]] == true) ret.push_back(s2[i]);

    }
    return ret;
}

edited Feb 13 '16 at 07:07

Abhishek Goel

18,785
11
87
65

answered May 07 '10 at 04:41

quasar

61
1
1

score 6 · Answer 6 · edited Apr 17 '12 at 07:31

6

with set 1 build a binary search tree with O(log n) and iterate set2 and search the BST m X O(log n) so total O(log n) + O(m)+O(log n) ==> O(log n)(m+1)

edited Apr 17 '12 at 07:31

sra

23,820
7
55
89

answered Apr 21 '11 at 02:23

khaja

61
1
1

2

For the binary search tree part, one of the lists still needs to be sorted (that will add an O(m log m) or and O(n log n) to the complexity). This is still a really useful answer, though: in my case, I have two lists containing the same objects, but each sorted according to different object attributes - and I need to get which objects are in both lists. This answer is agnostic to the attribute on which each list is sorted. Thanks! – accidental_PhD Sep 14 '12 at 18:44
2

Actually, building the tree is O(n log n) so it's O((n+m)log n) overall – c-urchin Feb 23 '13 at 00:42

score 3 · Answer 7 · answered Jan 04 '13 at 19:55

Here is another possible solution I came up with takes O(nlogn) in time complexity and without any extra storage. You can check it out here https://gist.github.com/4455373

Here is how it works: Assuming that the sets do not contain any repetition, merge all the sets into one and sort it. Then loop through the merged set and on each iteration create a subset between the current index i and i+n where n is the number of sets available in the universe. What we look for as we loop is a repeating sequence of size n equal to the number of sets in the universe.

If that subset at i is equal to that subset at n this means that the element at i is repeated n times which is equal to the total number of sets. And since there are no repetitions in any set that means each of the sets contain that value so we add it to the intersection. Then we shift the index by i + whats remaining between it and n because definitely none of those indexes are going to form a repeating sequence.

Sorting a linked list can't be nlogn – shinzou Nov 20 '17 at 23:09 — shinzou, Nov 20 '17 at 23:09

score 2 · Answer 8 · answered Jun 16 '16 at 14:11

2

Using skip pointers and SSE instructions can improve list intersection efficiency.

answered Jun 16 '16 at 14:11

Wolf Garbe

184
1
7

Wookai · Answer 9 · 2009-01-30T22:00:23.660

2

First, sort both lists using quicksort : O(n*log(n). Then, compare the lists by browsing the lowest values first, and add the common values. For example, in lua) :

function findIntersection(l1, l2)
    i, j = 1,1
    intersect = {}

    while i < #l1 and j < #l2 do
        if l1[i] == l2[i] then
            i, j = i + 1, j + 1
            table.insert(intersect, l1[i])
        else if l1[i] > l2[j] then
            l1, l2 = l2, l1
            i, j = j, i
        else
            i = i + 1
        end
    end

    return intersect
end

which is O(max(n, m)) where n and m are the sizes of the lists.

EDIT: quicksort is recursive, as said in the comments, but it looks like there are non-recursive implementations

edited Jan 30 '09 at 22:00

answered Jan 30 '09 at 21:47

Wookai

20,883
16
73
86

Isn't quicksort recursive? Or is there a non-recursive version of it? – Jan 30 '09 at 21:50
I wouldn't call that O(max(n,m)). You're doing two sorts, too. – Tom Ritter Jan 30 '09 at 21:51
Is there a non-recursive version of mergesort that could work also? – Jan 30 '09 at 21:51
Heapsort is non-recursive, but adds some data-structure needs. Would it be ok ? – Wookai Jan 30 '09 at 21:57
Heapsort might work. I'll have to look into it. Very frustrating -- this eviews language is clearly not meant for programmers. – Jan 30 '09 at 22:00
Yep, looks so. Good luck with using it ! – Wookai Jan 30 '09 at 22:04
1

There is a non-recursive quicksort. Push the full interval to be sorted onto the stack. Then in a loop, pop then partition the interval. Any intervals that need further sorting are pushed onto the stack. Go back to the top of the loop, pop the partition... Rinse repeat, until the stack is empty. – EvilTeach Apr 29 '11 at 13:19
A nitpick here: quicksort isn't guaranteed to run in O(n log n) time. It is actually a Omega(n^2) algorithm in the worst case. We can only say that the *average* time that quicksort takes is O(n log n). – rbrito Aug 26 '12 at 07:49

Imran · Answer 10 · 2009-01-31T22:39:17.577

1

Why not implement your own simple hash table or hash set? It's worth it to avoid nlogn intersection if your lists are large as you say.

Since you know a bit about your data beforehand, you should be able to choose a good hash function.

edited Jan 31 '09 at 22:39

answered Jan 30 '09 at 21:53

Imran

12,950
8
64
79

score 1 · Answer 11 · edited Oct 06 '11 at 18:37

1

If there is a support for sets (as you call them in the title) as built-in usually there is a intersection method.

Anyway, as someone said you could do it easily (I will not post code, someone already did so) if you have the lists sorted. If you can't use recursion there is no problem. There are quick sort recursion-less implementations.

edited Oct 06 '11 at 18:37

MPelletier

16,256
15
86
137

answered Jan 30 '09 at 21:54

Andrea Ambu

38,188
14
54
77

1. If eviews would support sets, it would probably offer a method for set intersections 2. How can joining two sets help here. The intersection are those elements that are in both sets. When I here joining I think of calculation the union of two sets – f3lix Jan 31 '09 at 22:47
java has support for sets but no built-in intersection functions. – lensovet May 21 '10 at 10:17
2

@lensovet: if it implements java.util.Set there is the java.util.Set.retainAll(Collection) method. Its documentation reads: "If the specified collection is also a set, this operation effectively modifies this set so that its value is the *intersection* of the two sets." – Andrea Ambu May 24 '10 at 08:43

score 1 · Answer 12 · answered Jan 31 '09 at 22:55

1

I second the "sets" idea. In JavaScript, you could use the first list to populate an object, using the list elements as names. Then you use the list elements from the second list and see if those properties exist.

answered Jan 31 '09 at 22:55

Nosredna

83,000
15
95
122

score 0 · Answer 13 · answered Jul 16 '09 at 12:47

0

In PHP, something like

function intersect($X) { // X is an array of arrays; returns intersection of all the arrays
  $counts = Array(); $result = Array();
  foreach ($X AS $x) {
    foreach ($x AS $y) { $counts[$y]++; }
  }
  foreach ($counts AS $x => $count) {
    if ($count == count($X)) { $result[] = $x; }
  }
  return $result;
}

answered Jul 16 '09 at 12:47

1

If you have duplicates in any of arrays you'll get incorrect behaviour. – Slawek Feb 19 '12 at 13:35

score 0 · Answer 14 · answered Oct 19 '15 at 10:08

From the definition of Big-Oh notation:

T(N) = O(f(N)) if there are positive constants c and n 0 such that T(N) ≤ cf(N) when N ≥ n 0.

Which in practice means that if the two lists are relatively small in size say something less 100 elements in each two for loops works just fine. Loop the first list and look for similar object in the second. In my case it works just fine because I won't have more than 10 - 20 max elements in my lists. However, a good solution is the sort the first O(n log n), sort the second also O(n log n) and merge them, another O(n log n) roughly speeking O(3 n log n), say that the two lists are the same size.

minchaej · Answer 15 · 2021-06-30T05:19:14.567

Time: O(n) Space: O(1) Solution for identifying points of intersection.

For example, the two given nodes will detect the point of intersection by swapping pointers every time they reach the end. Video Explanation Here.

public ListNode getIntersectionNode(ListNode headA, ListNode headB) {
    ListNode pA = headA;
    ListNode pB = headB;
    while (pA != pB) {
        pA = pA == null ? headB : pA.next;
        pB = pB == null ? headA : pB.next;
    }
    return pA;
}

Thanks.

Edit

My interpretation of intersection is finding the point of intersection.

For example:

For the given lists A and B, A and B will "meet/intersect" at point c1, and the algo above will return c1. As OP stated that OP has NO access to Hashmaps or some sort, I believe OP is saying that the algo should have O(1) space complexity.

I got this idea from Leetcode some time ago, if interested: Intersection of Two Linked Lists.

From David's comments to other answers, it would seem that he is looking for all the elements that are common to both lists. Can you please summarise the video's information in your post, in particular the interpretation of *intersection* used? — greybeard, Jun 30 '21 at 05:02
@greybeard sure, I have edited my answer to meet your needs. Let me know if anything is confusing. Thanks. — minchaej, Jun 30 '21 at 05:19

Efficient list intersection algorithm

15 Answers15

Edit

Linked

Related