5

I need a bag/multiset-like data type in Python. I understand collections.Counter is often used for this purpose. But the comparison operators don't seem to work:

In [1]: from collections import Counter

In [2]: bag1 = Counter(a=1, b=2, c=3)

In [3]: bag2 = Counter(a=2, b=2)

In [4]: bag1 > bag2
Out[4]: True

This seems like a bug to me. I expected the less-than and greater-than operators to perform set-like subset and superset comparisons. But if that were the case then bag1 > bag2 would be false because bag2 contains an extra 'a'. There also don't seem to be subset/superset methods on Counter objects. So I have two questions:

  1. What comparison logic is used for Counter objects?
  2. How can I compare Counter objects for subset, superset, proper-subset, and proper-superset?
William Reed
  • 101
  • 4
  • Have you read the documentation? – jonrsharpe Sep 04 '14 at 18:41
  • Yes, if you mean the ones here: https://docs.python.org/2/library/collections.html#collections.Counter They describe it as a bag/multiset, I assumed that meant the comparison operators would be meaningful. – William Reed Sep 04 '14 at 18:54
  • I had not seen the linked question before and I see how it's related but it's got a couple issues: no description of how to implement proper-subset though I guess that `<=` and `!=` though I don't know if there's a better way to implement and superset though I guess that's the inverse of subset. Also the accepted answer has a typo which a comment tries to call attention to. Not sure if someone can edit that. – William Reed Sep 04 '14 at 19:00
  • `set('abc')>set('ab')` is also `True` – dawg Sep 04 '14 at 19:28

3 Answers3

2

On Python 2, the comparison falls back to the default sort order for dictionaries (Counter is a subclass of dict).

Mappings (dictionaries) compare equal if and only if their sorted (key, value) lists compare equal. [5] Outcomes other than equality are resolved consistently, but are not otherwise defined. [6]

On Python 3, the comparison raises a TypeError:

Mappings (dictionaries) compare equal if and only if they have the same (key, value) pairs. Order comparisons ('<', '<=', '>=', '>') raise TypeError.

Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
  • I'm Python 2.7, sorry forgot to mention. – William Reed Sep 04 '14 at 18:48
  • Also, comparison by memory address doesn't seem to happen here: id(bag1) is 4371127560 and id(bag2) is 4371127856. So `bag2` should be greater than `bag1`. – William Reed Sep 04 '14 at 18:50
  • @WilliamReed: The ordering you are seeing is arbitrary. – Steven Rumbalski Sep 04 '14 at 18:50
  • will it change during the life of the Python program? Will it change from execution to execution of the Python program? – William Reed Sep 04 '14 at 18:52
  • @WilliamReed: It should not change during the life a Python program. It *may* change from execution to execution. – Steven Rumbalski Sep 04 '14 at 18:53
  • 5
    It doesn't fall back to the default ordering; it falls back to the [dict implementation of `__cmp__`](http://hg.python.org/cpython/file/f17ab9fed3b0/Objects/dictobject.c#l1792), because `Counter` inherits from `dict`. – user2357112 Sep 04 '14 at 18:59
  • @user2357112 Thanks for the reference. That's much more comforting than the thought that it could change between executions. – William Reed Sep 04 '14 at 19:02
  • @user2357112: Updated as such. – Steven Rumbalski Sep 04 '14 at 19:04
  • 1
    @WilliamReed: user2357112 is certainly correct, but it's a small comfort. What does it mean to say one dictionary is less than another? In Python 2 the answer to that question is well defined, but was deemed so unuseful as to be done away with in Python 3. – Steven Rumbalski Sep 04 '14 at 19:21
1

This unanswered question is of interest:

How can I compare Counter objects for subset, superset, proper-subset, and proper-superset?

By defining the missing “rich comparison methods". You could also use free functions instead, which will make client code more explicit.

from collections import Counter

class PartiallyOrderedCounter(Counter):

    def __le__(self, other):
        """ Multiset inclusion """
        return all( v <= other[k] for k,v in self.items() )


    def __lt__(self, other):
        """ Multiset strict inclusion """
        return self <= other and self != other


    # TODO : __ge__ and __gt__
    # Beware : they CANNOT be written in terms of __le__ or __lt__


a = PartiallyOrderedCounter('abc')
b = PartiallyOrderedCounter('ab')
c = PartiallyOrderedCounter('abe')

assert a <= a
assert not a < a    
assert b <= a
assert b < a
assert not a < b    
assert not c <= a
assert not a <= c
YvesgereY
  • 3,778
  • 1
  • 20
  • 19
  • 1
    I wonder why this doesn't come with Counter. I can think of one application where this would be useful (cyclopeptide sequencing) and there must be others. Upvoting... – Blaise Zydeco Jan 29 '21 at 19:16
  • `__ge__(self, other)` can in fact be written as equivalent to `__le__(other, self)`, and similarly for `__gt__`/`__lt__`. However, `functools.total_ordering` will use `not __lt__(self. other)` for `__ge__`, which is not acceptable. – Karl Knechtel Aug 01 '22 at 21:04
1

Rather than using all to iterate over the Counter values, we can implement the "multiset inclusion" check like so:

from collections import Counter
from functools import total_ordering

class PartiallyOrderedCounter(Counter):
    def __le__(self, other):
        """ Multiset inclusion """
        test = self.copy()
        test.subtract(other)
        return not -test
    
    def __lt__(self, other):
        """ Multiset strict inclusion """
        return self <= other and self != other

This is probably less efficient, but interesting enough to be worth mentioning.

How it works: The Counter's implementation of unary negation (as well as the binary - operator) negates each counter value in the result, but then also strips out everything that is negative or zero. The .subtract method, meanwhile, operates in place (necessitating a copy), but allows the result to have negative counts. Thus, when we negate the result from the subtraction, we get a Counter of only values that were "missing" from other (creating negative counts that were negated into positive ones). If and only if there are no such values, the counter object is Falsey, and we return True by applying the not logical operator.

h/t @PM2Ring for the technique.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153