Unexpected behaviour in Python OrderedSet.issuperset()

Question

I have two OrderedSets and I'm trying to check whether one in a subset of the other - both the elements and their order is important. However, the orderedset package is giving me strange results.

>>> import orderedset
>>> a = orderedset.OrderedSet([433, 316, 259])
>>> b = orderedset.OrderedSet([433, 316, 69])
>>> a.issuperset(b)
True

This doesn't make any sense to me because b contains a value (69) that is definitely not in a. Why is a a superset of b then?

However, when I try this:

>>> c = orderedset.OrderedSet([1, 2, 3])
>>> d = orderedset.OrderedSet([1, 2, 4])
>>> c.issuperset(d)
False

This behaviour seems inconsistent to me: why should the choice of values - [433, 316, 259] versus [1, 2, 3] - in the OrderedSet affect the output of issuperset()?

Perhaps there's a better way to do this? I need to know whether the elements in b are contained in a in the same order. Meaning that, if

a = OrderedSet([433, 316, 259])

I'm looking for partial matches within that set that start with the same starting value as a (433). This is what I want out of it:

OrderedSet([433, 316, 259])
OrderedSet([433, 316]])
OrderedSet([433])

And not:

OrderedSet([433, 259])
OrderedSet([316, 259])
OrderedSet([433, 316, 69])
OrderedSet([433, 259, 316])
OrderedSet([259, 433])
...

Basically, if this really confusing - I have an ordered set and I'm trying to find partial matches both in terms of the values and their order.

I don't think `issuperset` is going to work properly here -- it usually assumes sorted sets. — rlbond, Jun 18 '15 at 18:36

Martin Konecny · Accepted Answer · 2015-06-18T18:38:26.357

4

Presumably you are using this third party module since Python doesn't have a built-in orderedset.

A quick look through the source code on github shows that the issuperset function is implemented as

def issuperset(self, other):
    return other <= self

Taking a look at how the less than or equal operator is defined for orderedset:

def __le__(self, other):
    if isinstance(other, _OrderedSet):
        return len(self) <= len(other) and list(self) <= list(other)

So essentially, when comparing two ordered sets they are first converted to lists, and then the Python built-in <= is used to compare the two lists. When you compare two lists using <=, it's similar to a lexical string comparison meaning it compares matching indices of both lists.

According to their implementation, [433, 316, 259] is a superset of [433, 316, 69] (all matching indices of the first list are greater that or equal of the second list), whereas [433, 316, 259] would not be a superset of [433, 316, 260] (the second list has a bigger value in its last index than the first list).

What they probably wanted to write was

return len(self) <= len(other) and set(self) <= set(other)

Which would use the <= defined for the built-in Python sets, and properly test for subsets and supersets.

You may want to report this issue here.

edited Jun 18 '15 at 18:38

answered Jun 18 '15 at 18:33

Martin Konecny

57,827
19
139
159

Thanks, I think I understand the problem now. I guess part of it is that defining superset/subset for an ordered set is ambiguous. I guess I should write something that is specifically suited to my needs rather than rely on an external definition. I'll report the bug so that they know about it. If you have a suggestion for how I implement this (helpful constructs, etc.), I'd really appreciate it. – Navaneethan Santhanam Jun 18 '15 at 18:45
Do you care about order when testing for superset or subset? By definition a set has no order, so this behaviour is undefined really. – Martin Konecny Jun 18 '15 at 18:52
I'm interested in order as much as membership - I'd like to know how many partial matches there are to my ordered set. So in the example, `[433, 316, 259]` represents a sequence of actions that users did on a website. I'd like to know, for instance, how many users completed the entire set of actions, how many completed just the first 2 actions in the sequence `[433, 316]`, and how many just stopped after completing the first `[433]`. Perhaps there's a better way to represent this information and process it? – Navaneethan Santhanam Jun 18 '15 at 19:58
Based on your updated question, it seems that you don't really need the properties of a set, you just want ordering. In that case you could just use a list. Your problem seems to boil down to "does a sublist exist in a list". If that's the case, try taking a look here: http://stackoverflow.com/questions/10106901/elegant-find-sub-list-in-list – Martin Konecny Jun 18 '15 at 22:59

Unexpected behaviour in Python OrderedSet.issuperset()

1 Answers1