4

I'm trying to use queue.PriorityQueue in Python 3(.6).

I would like to store objects with a given priority. But if two objects have the same priority, I don't mind PriorityQueue.get to return either. In other words, my objects can't be compared at integers, it won't make sense to allow them to be, I just care about the priority.

In Python 3.7's documentation, there's a solution involving dataclasses. And I quote:

If the data elements are not comparable, the data can be wrapped in a class that ignores the data item and only compares the priority number:

from dataclasses import dataclass, field
from typing import Any

@dataclass(order=True)
class PrioritizedItem:
    priority: int
    item: Any=field(compare=False)

Alas, I'm using Python 3.6. In the documentation of this version of Python, there's no comment on using PriorityQueue for the priorities, not bothering about the "object value" which wouldn't be logical in my case.

Is there a better way than to define __le__ and other comparison methods on my custom class? I find this solution particularly ugly and counter-intuitive, but that might be me.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
vincent-lg
  • 539
  • 7
  • 15
  • Do your objects actually throw an exception if you compare them to each other? If not, then there's nothing to worry about. – jasonharper Jan 03 '19 at 18:46
  • @jasonharper just because two objects compare without an error being thrown does not mean that the objects are sorting in a reasonable order. Look at this question: https://stackoverflow.com/questions/6252758/python-default-comparison – Nick Chapman Jan 03 '19 at 19:05

5 Answers5

8

dataclasses is just a convenience method to avoid having to create a lot of boilerplate code.

You don't actually have to create a class. A tuple with a unique counter value too:

from itertools import count

unique = count()

q.put((priority, next(unique), item))

so that ties between equal priority are broken by the integer that follows; because it is always unique the item value is never consulted.

You can also create a class using straight-up rich comparison methods, made simpler with @functools.total_ordering:

from functools import total_ordering

@total_ordering
class PrioritizedItem:
    def __init__(self, priority, item):
        self.priority = priority
        self.item = item

    def __eq__(self, other):
        if not isinstance(other, __class__):
            return NotImplemented
        return self.priority == other.priority

    def __lt__(self, other):
        if not isinstance(other, __class__):
            return NotImplemented
        return self.priority < other.priority
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • very elegant way to get the distinguishing element – Patrick Artner Jan 03 '19 at 19:26
  • I really like the `counter()` solution. It has the advantage of avoiding to create a class (some lines of code, but not much) and the added benefit that it doesn't require to create instances just for the priority counter. This is a performance issue and probably a very minor issue except when dealing with millions of objects, but I still find this solution more elegant. Thanks! – vincent-lg Jan 03 '19 at 20:48
  • Note that your handling of the counter is not thread-safe; the `queue` module is explicitly thread-safe (I personally use the `heapq` module directly when thread-safety is not required), so this not being thread-safe could be a big issue. – Martijn Pieters Jan 03 '19 at 21:58
  • 1
    @AndyTheEntity: please don't add `self.` to the `__class__` references. It's deliberate, the (unqualified) `__class__` reference is a closure that gives you the class where the method is defined, and not that of a current subclass. – Martijn Pieters Jul 15 '23 at 21:53
2

See priority queue implementation notes - just before the section you quoted (regarding using dataclasses) it tells you how to do it whitout them:

... is to store entries as 3-element list including the priority, an entry count, and the task. The entry count serves as a tie-breaker so that two tasks with the same priority are returned in the order they were added. And since no two entry counts are the same, the tuple comparison will never attempt to directly compare two tasks.

So simply add your items as 3rd element in a tuple (Prio, Count, YourElem) when adding to your queue.

Contreived example:

from queue import PriorityQueue

class CompareError(ValueError): pass

class O:
    def __init__(self,n):
        self.n = n

    def __lq__(self):
        raise CompareError

    def __repr__(self): return str(self)
    def __str__(self): return self.n

def add(prioqueue,prio,item):
    """Adds the 'item' with 'prio' to the 'priorqueue' adding a unique value that
    is stored as member of this method 'add.n' which is incremented on each usage."""
    prioqueue.put( (prio, add.n, item))
    add.n += 1

# no len() on PrioQueue - we ensure our unique integer via method-param
# if you forget to declare this, you get an AttributeError
add.n = 0

h = PriorityQueue()

add(h, 7, O('release product'))
add(h, 1, O('write spec 3'))
add(h, 1, O('write spec 2'))
add(h, 1, O('write spec 1'))
add(h, 3, O('create tests'))

for _ in range(4):
    item = h.get()
    print(item)

Using h.put( (1, O('write spec 1')) ) leads to

TypeError: '<' not supported between instances of 'O' and 'int'`

Using def add(prioqueue,prio,item): pushes triplets as items wich have guaranteed distinct 2nd values so our O()-instances are never used as tie-breaker.

Output:

(1, 2, write spec 3)
(1, 3, write spec 2)
(1, 4, write spec 1)
(3, 5, create tests)

see MartijnPieters answer @here for a nicer unique 2nd element.

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • 1
    Why are you delving into `heappush` when this question is specifically about using `queue.PriorityQueue`? – Nick Chapman Jan 03 '19 at 19:00
  • @NickChapman I what? heappush? where? :o) thanks for hinting my blunder – Patrick Artner Jan 03 '19 at 19:21
  • Thanks a lot, I find this solution much more elegant and optimized. A `Counter` object actually allows to reduce the boiler-plate code even more. That proves: I don't read the documentation right! – vincent-lg Jan 03 '19 at 20:54
  • @vincent-lg well, that documentation is part of the `heapq` module. It so happens that `queue.PriorityQueue` is implemented on top op `heapq`, but it takes some thorough reading to get from the `queue` documentation to that advice in the `heapq` module docs. – Martijn Pieters Jan 03 '19 at 21:53
  • @MartijnPieters heapq is referenced on top of the site that he quoted from: `With a priority queue, the entries are kept sorted (using the heapq module) and the lowest valued entry is retrieved first` thats why I landed there – Patrick Artner Jan 03 '19 at 21:56
  • @PatrickArtner yes, I didn’t say otherwise! But to expect someone looking at the queue module docs to read the heapq module documentation in enough detail to find that section is quite a big ask. – Martijn Pieters Jan 03 '19 at 21:59
0

Let's assume that we don't want to write a decorator with equivalent functionality to dataclass. The problem is that we don't want to have to define all of the comparison operators in order to make our custom class comparable based on priority. The @functools.total_ordering decorator can help. Excerpt:

Given a class defining one or more rich comparison ordering methods, this class decorator supplies the rest. This simplifies the effort involved in specifying all of the possible rich comparison operations:

The class must define one of __lt__(), __le__(), __gt__(), or __ge__(). In addition, the class should supply an __eq__() method.

Using the provided example:

from functools import total_ordering

@total_ordering
class PrioritizedItem:
    # ...

    def __eq__(self, other):
        return self.priority == other.priority

    def __lt__(self, other):
        return self.priority < other.priority
Community
  • 1
  • 1
Chris Hunt
  • 3,840
  • 3
  • 30
  • 46
  • 1
    you only need `__lt__` in order to sort so you don't even need to worry about generating the rest of the rich comparison operators. – Nick Chapman Jan 03 '19 at 18:56
  • @NickChapman: `list.sort` promises to only use `<`, but the fact that `heapq` or `queue.PriorityQueue` only use `<` is an implementation detail. Also, relying on `<`-only comparisons is a recipe for weird surprises anyway. It's best to provide all comparison operations. – user2357112 Jan 03 '19 at 19:09
  • Also, you should really return `NotImplemented` when `other` is of an unrecognized type instead of just unconditionally accessing `other.priority`. – user2357112 Jan 03 '19 at 19:09
  • @user2357112 can you explain why it should be that way? Also wouldn't a `TypeError` be more appropriate? – Chris Hunt Jan 03 '19 at 19:19
  • @ChrisHunt: The TypeError isn't `__lt__`'s responsibility. Comparison methods are supposed to return `NotImplemented` if they don't know how to perform a comparison, to give the other side a chance to implement it. If both sides return `NotImplemented`, the comparison machinery raises a TypeError (or returns False/True for ==/!=). – user2357112 Jan 03 '19 at 19:25
  • @user2357112, thanks. The text [here](https://docs.python.org/3/library/numbers.html#implementing-the-arithmetic-operations) coupled with the description of [`NotImplemented`](https://docs.python.org/3/library/constants.html#NotImplemented) helped me understand. Is the convention to use duck-typing and catch `AttributeError` on accessing the relevant properties on `other` or explicitly check the type as in the arithmetic operations and @MartijnPieters answer? – Chris Hunt Jan 03 '19 at 19:50
  • 1
    @ChrisHunt: Explicit type checking. In this kind of situation, you usually care about the actual type, not what attributes the argument has. – user2357112 Jan 03 '19 at 21:04
0

All you need is a wrapper class that implements __lt__ in order for PriorityQueue to work correctly. This is noted here:

The sort routines are guaranteed to use __lt__() when making comparisons between two objects. So, it is easy to add a standard sort order to a class by defining an __lt__() method

It's as simple as something like this

class PriorityElem:
    def __init__(self, elem_to_wrap):
        self.wrapped_elem = elem_to_wrap

    def __lt__(self, other):
        return self.wrapped_elem.priority < other.wrapped_elem.priority

If your elements do not have priorities then it's as simple as:

class PriorityElem:
    def __init__(self, elem_to_wrap, priority):
        self.wrapped_elem = elem_to_wrap
        self.priority = other.priority

    def __lt__(self, other):
        return self.priority <  other.priority

Now you can use PriorityQueue like so

queue = PriorityQueue()
queue.put(PriorityElem(my_custom_class1, 10))
queue.put(PriorityElem(my_custom_class2, 10))
queue.put(PriorityElem(my_custom_class3, 30))

first_returned_elem = queue.get()
# first_returned_elem is PriorityElem(my_custom_class1, 10)
second_returned_elem = queue.get()
# second_returned_elem is PriorityElem(my_custom_class2, 10)
third_returned_elem = queue.get()
# third_returned_elem is PriorityElem(my_custom_class3, 30)

Getting at your original elements in that case would be as simple as

elem = queue.get().wrapped_elem

Since you don't care about sort stability that's all you need.

Edit: As noted in the comments and confirmed here, heappush is not stable:

unlike sorted(), this implementation is not stable.

Nick Chapman
  • 4,402
  • 1
  • 27
  • 41
  • `list.sort` and `sorted` may be stable, but `queue.PriorityQueue` is based on a heap, and the heap implementation isn't stable. – user2357112 Jan 03 '19 at 19:05
  • The current `heapq` implementation indeed only uses `<` to make comparisons, but this is an *implementation detail*. I'd not rely on such details and instead provide more comparison methods. – Martijn Pieters Jan 03 '19 at 19:16
  • @MartijnPieters if it is an implementation detail then we should create an issue against the documentation excerpted in the post which guarantees that it will only use `__lt__()`. – Chris Hunt Jan 03 '19 at 19:21
  • 1
    @ChrisHunt: a priority queue **doesn't sort**. It is built using the [`heapq`](https://docs.python.org/3/library/heapq.html) module, it's a binary tree stored in a list. – Martijn Pieters Jan 03 '19 at 19:24
  • 1
    Also see [Is it safe to just implement \_\_lt\_\_ for a class that will be sorted?](//stackoverflow.com/a/8796908), where Raymond Hettinger points out you should not really only on `__lt__`. I'll look at why the HOWTO still claims `__lt__` is enough, because no other Python documentation page makes the same claim. – Martijn Pieters Jan 03 '19 at 19:26
  • @MartijnPieters my mistake, I thought the linked documentation was for PriorityQueue. The answer you linked makes a pretty strong argument for `total_ordering` over just defining `__lt__`. – Chris Hunt Jan 03 '19 at 19:32
  • Thanks, this does work and is easy to read. A three-tuple with a counter in the middle might actually be more optimized, it would avoid having to create instances just to feed the priority queue, but that's more of a detail. – vincent-lg Jan 03 '19 at 20:59
  • I’ve filed a documentation issue to see if that claim can be removed. It has no provenance that I can find. See https://bugs.python.org/issue35654 – Martijn Pieters Jan 03 '19 at 21:48
0

another interesting solution in building off Martin's answer is to create a counter and use walrus operator to increment unique tie breakers within the argument.

counter = 0
q.put((priority, counter:= counter+1, item))