24

If I have an object that compares equal to an element of a Python set, but is not the same object, is there a reasonable way to get a reference to the object in the set? The use case would be using the set to identify and share duplicated data.

Example (Python 2.7):

>>> a = "This is a string"
>>> b = "This is a string"
>>> a is b
False
>>> a == b
True
>>> s = set((a,))
>>> b in s
True

How to get a reference to a using b and s? I can think of one way, but I'm not sure if it is not implementation-dependent whether you get a or b. EDIT: This does not work when s has more than one element; intersection is quite naturally implemented something like [x for x in smaller_set if x in larger_set]

>>> for x in set((b,)).intersection(s): c = x
...
>>> c is a
True

Perhaps a good workaround would be to use a dict that maps each key to itself, instead of the set.

Janne Karila
  • 24,266
  • 6
  • 53
  • 94
  • 2
    If you need a specific one of two equal, hashable objects, it seems likely that the objects shouldn't be equal and/or hashable. Why do you need this? –  Dec 23 '11 at 12:54
  • I think your suspicions are justified: pypy 1.7.0 and ironpython 3.0 both (can) return False for your final c is a. – DSM Dec 23 '11 at 12:59
  • I could save memory by changing references to equal object to references to the same object. – Janne Karila Dec 23 '11 at 13:01
  • I'll be fine using a dict, like I mention in the last sentence. I'm asking if I'm missing something here. – Janne Karila Dec 23 '11 at 13:05

2 Answers2

6

I found a similar question on python-list: Get item from set. There is a clever answer with reference to get_equivalent(container, item) (Python recipe).

The trick is to construct a wrapper object for the 'key' object, and check if wrapper is in the set using the in operator. If the wrapper hashes equal to the key, its __eq__ method can gain access to the object in the set, and save a reference to it. An important point from the discussion is that the __eq__ method of the set elements must return NotImplemented for unrecognized types, otherwise the wrapper's __eq__ may not get called.

Janne Karila
  • 24,266
  • 6
  • 53
  • 94
3

Your use case sounds like it is a use case for dictionaries. Use, as keys, the attribute of the object that compares equal to the "foreign" object, and as values the desired objects themselves.

If it is a simple use case, and you can have a linear search, however, you could do the obvious - it would not be bad:

def get_equal(in_set, in_element):
   for element in in_set:
       if element == in_element:
           return element
   return None 

If you need what exactly what you ar asking for (I can wonder some use cases for that) - the way to go is to create a custom dictionary class that has a set as one of its members, do implement proxy methods to the member set, and in both dictionary and set methods, keeps sync of both the dictionary and set contents. This would be time consuming to implement right, but relatively straightforward, and have a O(1) time.

If having to copy references to all data around is not an issue (this is linear, but probably worst than the straightforward search above), you can use the expression

(data - (data - {key})).pop()

as in:

In [40]: class A:
    ...:     def __init__(self, id, extra):
    ...:         self.id = id
    ...:         self.extra = extra
    ...:     def __eq__(self, other):
    ...:         return self.id == other.id
    ...:     def __hash__(self):
    ...:         return hash(self.id)
    ...:     def __repr__(self):
    ...:         return f"({self.id}, {self.extra})"
    ...: 
    ...: 

In [41]: data = set(A(i, "initial") for i in range(10))

In [42]: (data - (data - {A(5, None)})).pop()
Out[42]: (5, initial)
jsbueno
  • 99,910
  • 10
  • 151
  • 209