Is there an equivalent to python set
for non-hashable objects? (For instance custom class that can be compared to one another but not hashed?)

- 12,889
- 11
- 68
- 115
-
What object would that be? – Sep 16 '13 at 08:53
-
A string that should compared using the difflib library – Tzach Sep 16 '13 at 08:54
-
Strings are hashable. – Sep 16 '13 at 08:54
-
2You can define the `__hash__` method for custom classes – perreal Sep 16 '13 at 08:55
-
Not really; you'd still have to create a hashable representation of the objects; that process can be encapsulated but *differs* from specific object to specific object. – Martijn Pieters Sep 16 '13 at 08:55
-
You'd be better off making the objects hashable instead. – Martijn Pieters Sep 16 '13 at 08:55
-
The string is wrapped in a class that only implement __eq__ because the comparison must be made using difflib, and hash has no meaning. – Tzach Sep 16 '13 at 08:55
-
@MartijnPieters I can't create a hashable class because it's not an exact matching, but fuzzy matching. – Tzach Sep 16 '13 at 08:57
-
3@Tzach: Then sets are not the way to go anyway; you have to grade each input against the existing strings, fuzzily. That's not a set operation *at all*. – Martijn Pieters Sep 16 '13 at 08:59
-
I know, and i also know how to implement it, but i thought maybe there is an existing class. – Tzach Sep 16 '13 at 09:00
-
You can always use a `list` here; `someobj in listvalue` will use equality to test against each stored entry. That's as close as you'll get, since that is what your 'custom' set would have to do *anyway*. – Martijn Pieters Sep 16 '13 at 09:03
2 Answers
If your values are not hashable, then there is no point in using set
.
Just use a list
instead. If all your objects can do is test for equality, then you'd have to scan each element every time to test for membership. obj in listvalue
does just that, scan the list until an equality match is found:
if not someobj in somelist:
somelist.append(someobj)
would give you a list of 'unique' values.
Yes, this is going to be slower than a set, but sets can only achieve O(1) complexity through hashes.
If your objects are orderable, you could speed up operations by using the bisect
module to bring down tests to O(log N) complexity, perhaps. Make sure you insert new values using the information gleaned from the bisection test to retain the order.

- 1,048,767
- 296
- 4,058
- 3,343
-
Am I right in thinking that intersection/difference on such lists are going to be quadratic and there's no way to improve that (assuming unsortability)? – georg Sep 16 '13 at 09:24
-
1"there is no point in using a set" - true iff you mean `set()`, since it will not work. your answer implements the set api using `list`, which is the way to go, but should have similar interface. – Elazar Jul 05 '15 at 19:17
-
@Elazar: this is miles away from a set. Yes, I mean the built-in `set()` type, where you can do O(1) membership tests and set operations like union, difference and intersection. – Martijn Pieters Jul 05 '15 at 20:58
-
1I believe we both agree on the bottom line. It's just that your wording ("no point in using a set") is a bit confusing. – Elazar Jul 06 '15 at 15:09
-
I disagree - using a list in this case in more tedious than using a set (or a set-like object). There's no reason you can't define a custom class that acts like a set, using a list as the backing storage. It obviously won't have the same performance as a real set, but it would be much nicer to work with than a list. – JesusFreke Dec 19 '18 at 06:08
-
@JesusFreke: nothing in my answer precludes you from creating your own wrapper class. That doesn't change the fact you can't use the *built-in set* for non-hashable objects. This question is about the built-in set type. – Martijn Pieters Dec 19 '18 at 17:38
-
Nothing in your answer precludes me from doing anything :). Other than maybe truthfully saying that I haven't read your answer. But I do disagree with your sentiment that there is no point in looking for/creating/using a set-equivalent object for non-hashable types. – JesusFreke Dec 19 '18 at 20:21
-
@JesusFreke: you are misunderstanding that sentence then. When we talk about `set` in Python, we mean the built-in type, not the protocol. That'd be `collections.abc.Set` and `collections.abc.MutableSet`. – Martijn Pieters Dec 19 '18 at 20:40
-
1But the question isn't asking if you can use the built-in set with a non-hashable object. The whole basis of the question is that you can't. It's asking for an equivalent object. Of course, the exact meaning of "equivalent" in this context is potentially unclear. But given that the question specifically mentions comparable items, it's probably safe to assume it's not looking for a performance-equivalent solution, but rather a syntax-equivalent solution. – JesusFreke Dec 19 '18 at 20:55
-
@JesusFreke: yes, and my advise is to just stick with a list in that case. Note that in the comments the OP shared that their objects aren't strictly comparable either (*it's not an exact matching, but fuzzy matching*). – Martijn Pieters Dec 19 '18 at 21:48
-
@JesusFreke: Note that the sortedset class you promote in your answer is essentially the same approach as the orderable / bisect setup, using a tree (see https://en.wikipedia.org/wiki/Order_statistic_tree). It boils down to the exact same thing. – Martijn Pieters Dec 19 '18 at 21:54
-
1Yes, obviously. The question was asking if there was an equivalent to the set for non-hashable objects. My answer mentioned one equivalent that I was able to find. This seems like a more useful answer to me, and the type of answer I was looking for when I viewed this question. I am fully aware of how to implement such a class (as was the original asker of the question, based on their comments) - but I would much rather find and use a pre-existing one, rather than re-inventing the wheel. – JesusFreke Dec 19 '18 at 22:08
There is the sortedset class from the blist library, which offers a set-like api for comparable (and potentially non-hashable) objects, using a storage mechanism based on a sorted list.

- 19,784
- 5
- 65
- 68