4

I know if I had two lists of, say integers, I could simply do list(set(list1) & set(list2)) to get the intersection. However, in my two lists, I have mutable objects, namely Nodes. Node is a class that can be initialized with a value.

Without having to do a double for-loop, is there any way to get the intersection of two lists based on their ids? I'm looking for something similar to list(set(list1) & set(list2)).

Update: By id I am referring to the built-in id() function in Python which returns the address of where the object is stored in memory.

So, I'm asking what is the Intersection of say, [Node1, Node2, Node3] and [Node100, Node2, Node3]. Obviously I can't use the set intersection method above. I need to identify they are the same by accessing memory. If I can't try to identify them based on their value attribute because they may Node1 may have the same value as Node100, but they are not the same objects in memory.

TheRealFakeNews
  • 7,512
  • 16
  • 73
  • 114

2 Answers2

4

There's no need to intersect two sets. In this case you can just check if the id() exists in another set.

set2 = {id(n) for n in list2}
result = [n for n in list1 if id(n) in set2]

The complexity of this code is O(n1 + n2). I'll explain this in following equivalent but more readable code:

set2 = {id(n) for n in list2}  # O(n2)
result = []
for n in list1:  # O(n1)
    if id(n) in set2:  # O(1)
        result.append(n)  # O(1)

In total it's O(n1 + n2).


There is also an alternative solution if you can make change to the Node class by just defining the __hash__ and __eq__ method.

class Node:
    ...

    def __hash__(self):
        return id(self)

    def __eq__(self, another):
        return id(self) == id(another)


list1 = [...]
list2 = [...]

result = set(list1) & set(list2)
Philip Tzou
  • 5,926
  • 2
  • 18
  • 27
  • I mean the python `id` built-in function, not my attribute – TheRealFakeNews Feb 07 '19 at 01:24
  • Also, this is just a for loop – TheRealFakeNews Feb 07 '19 at 01:24
  • @MadPhysicist well you have to convert list to set first. Internally the `set()` initializer will also walk through the whole list (and I think it counts a loop). – Philip Tzou Feb 07 '19 at 01:25
  • @AlanH. This is not just a for loop. The set construction is O(n2), and the loop is O(n1). This makes it O(n1+n2), not O(n1*n2). For comparable list sizes, you end up with O(n) instead of O(n^2). And you can trivially replace n.id with id(n) – Mad Physicist Feb 07 '19 at 01:27
  • @MadPhysicist Can you explain more as to why it's O(list1 + list2)? If I have a for loop, and for each iteration, I have to create a set of `k` elements, isn't that just multiplying the run-times? – TheRealFakeNews Feb 07 '19 at 01:32
  • @Philip. `return hash(id(self))`? Or is the hash of an integer just itself? – Mad Physicist Feb 07 '19 at 01:37
  • @MadPhysicist actually by default `__hash__` returns what `id() / 16` returns... (see https://stackoverflow.com/a/11324771/2644759) – Philip Tzou Feb 07 '19 at 01:40
0

The solution you suggested will work.

class Node:
    def __init__(self, value):
        self.value = value

    def __repr__(self):
        return "Node {}".format(self.value)

nodes1 = [Node(1), Node(2), Node(3)]
nodes2 = nodes1[:2] + [Node(4)]

common_nodes = set(nodes1) & set(nodes2)

print(common_nodes) # {Node 2, Node 1}

The reason this works is because despite being mutable, an instance of a class for which you did not define __hash__ or __eq__ will be hashed and compared by its id by default because it inherits those methods from object.

You can check this is true with the following experiment.

>>> obj = object()
>>> hash(obj)
155115580943
>>> id(obj)
2481849295088
>>> id(obj) // 16 == hash(obj)
True
Olivier Melançon
  • 21,584
  • 4
  • 41
  • 73