Python: How to find the intersection between two lists based on object's id?

Question

I know if I had two lists of, say integers, I could simply do list(set(list1) & set(list2)) to get the intersection. However, in my two lists, I have mutable objects, namely Nodes. Node is a class that can be initialized with a value.

Without having to do a double for-loop, is there any way to get the intersection of two lists based on their ids? I'm looking for something similar to list(set(list1) & set(list2)).

Update: By id I am referring to the built-in id() function in Python which returns the address of where the object is stored in memory.

So, I'm asking what is the Intersection of say, [Node1, Node2, Node3] and [Node100, Node2, Node3]. Obviously I can't use the set intersection method above. I need to identify they are the same by accessing memory. If I can't try to identify them based on their value attribute because they may Node1 may have the same value as Node100, but they are not the same objects in memory.

What do you mean "based on their ids"? Is the id an attribute of `Node` object? — Philip Tzou, Feb 07 '19 at 01:12
By 'id' do you mean references to the same objects which may appear in both `list1` and `list2`? — Valentino, Feb 07 '19 at 01:20
@MadPhysicist Philip has a much better answer. there's no point for me to keep mine up; I upvote his answer and move on. — Primusa, Feb 07 '19 at 01:24
To be sure as is not clear to me: do you want to keep in your final list two Nodes with the same value if they are not the same object in memory or not? — Valentino, Feb 07 '19 at 01:40

Philip Tzou · Accepted Answer · 2019-02-07T02:02:34.903

4

There's no need to intersect two sets. In this case you can just check if the id() exists in another set.

set2 = {id(n) for n in list2}
result = [n for n in list1 if id(n) in set2]

The complexity of this code is O(n1 + n2). I'll explain this in following equivalent but more readable code:

set2 = {id(n) for n in list2}  # O(n2)
result = []
for n in list1:  # O(n1)
    if id(n) in set2:  # O(1)
        result.append(n)  # O(1)

In total it's O(n1 + n2).

There is also an alternative solution if you can make change to the Node class by just defining the __hash__ and __eq__ method.

class Node:
    ...

    def __hash__(self):
        return id(self)

    def __eq__(self, another):
        return id(self) == id(another)


list1 = [...]
list2 = [...]

result = set(list1) & set(list2)

edited Feb 07 '19 at 02:02

answered Feb 07 '19 at 01:19

Philip Tzou

5,926
2
18
27

I mean the python `id` built-in function, not my attribute – TheRealFakeNews Feb 07 '19 at 01:24
Also, this is just a for loop – TheRealFakeNews Feb 07 '19 at 01:24
@MadPhysicist well you have to convert list to set first. Internally the `set()` initializer will also walk through the whole list (and I think it counts a loop). – Philip Tzou Feb 07 '19 at 01:25
@AlanH. This is not just a for loop. The set construction is O(n2), and the loop is O(n1). This makes it O(n1+n2), not O(n1*n2). For comparable list sizes, you end up with O(n) instead of O(n^2). And you can trivially replace n.id with id(n) – Mad Physicist Feb 07 '19 at 01:27
@MadPhysicist Can you explain more as to why it's O(list1 + list2)? If I have a for loop, and for each iteration, I have to create a set of `k` elements, isn't that just multiplying the run-times? – TheRealFakeNews Feb 07 '19 at 01:32
@Philip. `return hash(id(self))`? Or is the hash of an integer just itself? – Mad Physicist Feb 07 '19 at 01:37
@MadPhysicist actually by default `__hash__` returns what `id() / 16` returns... (see https://stackoverflow.com/a/11324771/2644759) – Philip Tzou Feb 07 '19 at 01:40

Olivier Melançon · Answer 2 · 2019-02-07T02:37:43.720

The solution you suggested will work.

class Node:
    def __init__(self, value):
        self.value = value

    def __repr__(self):
        return "Node {}".format(self.value)

nodes1 = [Node(1), Node(2), Node(3)]
nodes2 = nodes1[:2] + [Node(4)]

common_nodes = set(nodes1) & set(nodes2)

print(common_nodes) # {Node 2, Node 1}

The reason this works is because despite being mutable, an instance of a class for which you did not define __hash__ or __eq__ will be hashed and compared by its id by default because it inherits those methods from object.

You can check this is true with the following experiment.

>>> obj = object()
>>> hash(obj)
155115580943
>>> id(obj)
2481849295088
>>> id(obj) // 16 == hash(obj)
True

Python: How to find the intersection between two lists based on object's id?

2 Answers2