66

I need to use a priority queue in my Python code, and:

  • am looking for any fast implementations for priority queues
  • optimally, I'd like the queue to be generic (i.e. work well for any object with a specified comparison operator).

Looking around for something efficient, I came upon heapq, but:

  • I'm looking for something faster than heapq, which is implemented in native Python, so it's not fast.
  • It looks good, but seems to be specified only for integers. I suppose it works with any objects that have comparison operators, but it doesn't specify what comparison operators it needs.
  • Update: Re comparison in heapq, I can either use a (priority, object) as Charlie Martin suggests, or just implement __cmp__ for my object.
smci
  • 32,567
  • 20
  • 113
  • 146
Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
  • 2
    The fact that heapq is implemented in Python does not necessarily means that it is not fast. Why not just use it? Only try alternatives if it does not satisfy your performance needs. – Jingguo Yao Dec 12 '13 at 08:16

13 Answers13

52

You can use Queue.PriorityQueue.

Recall that Python isn't strongly typed, so you can save anything you like: just make a tuple of (priority, thing) and you're set.

Adrian W
  • 4,563
  • 11
  • 38
  • 52
Charlie Martin
  • 110,348
  • 25
  • 193
  • 263
  • PriorityQueue is great, works as intended but beware, it's 2.6 only. If your code will work on a platform older than 2.6.0 then you've to carry it around like a backport. – Berk D. Demir Jan 02 '09 at 19:17
  • 4
    Doesn't have a peek function :-( – Casebash Sep 28 '09 at 08:19
  • It solves original questioner's actual problem, which is to manage a queue of non-integers. – Charlie Martin Sep 09 '10 at 18:20
  • 4
    @Eli Bendersky: did you do a performance comparison between this and `heapq`? I would assume `heapq` to be faster because it doesn't do any locking. – Fred Foo Jun 24 '11 at 19:17
  • 19
    @larsmans I did some simple tests in Python2.6 that suggest that `heapq` is roughly twice as fast as `PriorityQueue` – simonb Aug 10 '11 at 23:32
  • 10
    Queue.PriorityQueue is synchronized. For situations where synchronization is unneeded, it incurred unneeded overhead. – Jingguo Yao Dec 12 '13 at 08:12
  • 1
    I take issue with your answer. Python is strongly typed :). See for example: http://stackoverflow.com/questions/11328920/is-python-strongly-typed It is not statically typed, though. – Gurgeh May 23 '14 at 21:45
  • I take issue with that definition of strongly-typed. So there. – Charlie Martin May 25 '14 at 01:15
  • 5
    Python -is- strongly typed. It's not statically, manifestly typed. – user1277476 Nov 06 '14 at 17:50
  • 1
    Come on, guys, go look up "strong typing". – Charlie Martin Jan 23 '15 at 15:19
  • 1
    @CharlieMartin to jump on this off-topic bandwagon, that's why I perceive it futile to call something like this "strong". It doesn't really mean anything, I mean its meaning is really implicit, will change in time and otherwise more sensitive to different definitions than it needs to be. It's like calling the first sequel to a publication Foo that really should be significantly updated and re-released periodically "The New Foo". By the way, if you look up [strong typing](https://en.wikipedia.org/wiki/Strong_and_weak_typing#Variation_across_programming_languages), you'll see how it's changing. – n611x007 May 10 '15 at 16:27
  • 1
    @naxa Yeah, I'm probably fighting to preserve precise definitions in vain. Of course, even that article points out that there are more or less precise definitions for the kind of type checking at runtime done by, eg, Python; and how languages with type inference are strongly typed but don't necessarily have to have the type specified in the program text if the type can be inferred logically; and how the type can be modified at runtime in languages that support duck typing. Now, the word "type" is well defined. String typing is actually well-defined. – Charlie Martin May 10 '15 at 23:14
  • 1
    (cont...) But "strong typing" sounds like it is a really really good thing to have, so everyone's favorite language must be strongly typed, definitions be damned. 30 years ago, "object oriented" was what everyone wanted, and so everything had to be object oriented, and definitions be damned. – Charlie Martin May 10 '15 at 23:21
  • 1
    @Casebash It has a `pq.queue` field, which is a `list`, and the first element is guaranteed to be the smallest (by `heapq`). So a peek would be: `pq.queue[0]` (provided `not pq.empty()`). Not for sterling-quality code though, as there is no contract about this in the documentation. – Evgeni Sergeev Jun 02 '16 at 03:32
  • 1
    `Queue.PriorityQueue` (`queue.PriorityQueue` in Python 3) is a terrible choice. The fundamental problem is that, despite just being called "queue", the `queue` library is really an inter-thread message channel library. `queue.PriorityQueue` is designed around being a prioritized inter-thread message channel, not a general-purpose priority queue. – user2357112 Feb 18 '22 at 09:17
  • That means it's slow, as other commenters have mentioned. It also means it has design decisions like, if you call `queue.get()` on an empty queue, it hangs forever instead of raising an exception. (You can call `get_nowait` instead, but if you forget, your program hangs.) It's also missing a lot of functionality that's useful for general-purpose queues but not useful for an inter-thread message channel, like a way to peek at an element without removing it. (You can access the undocumented `queue` attribute, but that's undocumented.) – user2357112 Feb 18 '22 at 09:24
25

When using a priority queue, decrease-key is a must-have operation for many algorithms (Dijkstra's Algorithm, A*, OPTICS), I wonder why Python's built-in priority queue does not support it. None of the other answers supply a solution that supports this functionality.

A priority queue which also supports decrease-key operation is this implementation by Daniel Stutzbach worked perfectly for me with Python 3.5.

from heapdict import heapdict

hd = heapdict()
hd["two"] = 2
hd["one"] = 1
obj = hd.popitem()
print("object:",obj[0])
print("priority:",obj[1])

# object: one
# priority: 1
Billal Begueradj
  • 20,717
  • 43
  • 112
  • 130
Guy s
  • 1,586
  • 3
  • 20
  • 27
  • 3
    Seems a reasonable answer: downvoters should explain themselves here. – WestCoastProjects Apr 22 '17 at 15:08
  • 1
    The documentation of (heapdict)[] says `hd.pop()` but calling `heapdict({None: 1}).pop()` gives `TypeError: pop() missing 1 required positional argument: 'key'`, similarly to a regular dict. Use `popitem()[0]` instead... – user66081 Dec 22 '18 at 14:29
  • The documentation has since been fixed - the erroneous `pop` example has been changed to use `popitem`. – user2357112 Feb 18 '22 at 09:06
19

I ended up implementing a wrapper for heapq, adding a dict for maintaining the queue's elements unique. The result should be quite efficient for all operators:

class PriorityQueueSet(object):

    """
    Combined priority queue and set data structure.

    Acts like a priority queue, except that its items are guaranteed to be
    unique. Provides O(1) membership test, O(log N) insertion and O(log N)
    removal of the smallest item.

    Important: the items of this data structure must be both comparable and
    hashable (i.e. must implement __cmp__ and __hash__). This is true of
    Python's built-in objects, but you should implement those methods if you
    want to use the data structure for custom objects.
    """

    def __init__(self, items=[]):
        """
        Create a new PriorityQueueSet.

        Arguments:
            items (list): An initial item list - it can be unsorted and
                non-unique. The data structure will be created in O(N).
        """
        self.set = dict((item, True) for item in items)
        self.heap = self.set.keys()
        heapq.heapify(self.heap)

    def has_item(self, item):
        """Check if ``item`` exists in the queue."""
        return item in self.set

    def pop_smallest(self):
        """Remove and return the smallest item from the queue."""
        smallest = heapq.heappop(self.heap)
        del self.set[smallest]
        return smallest

    def add(self, item):
        """Add ``item`` to the queue if doesn't already exist."""
        if item not in self.set:
            self.set[item] = True
            heapq.heappush(self.heap, item)
jsmedmar
  • 1,025
  • 10
  • 21
Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
  • 3
    Looks good, but you should use "item in set" rather than "set.has_key(item)". It's faster (less method call overhead), and the second one has been removed in Python 3.0. – Kiv Jan 02 '09 at 20:28
  • 2
    `items=[]` is a bad idea since the list is mutable. In addition, you can do `self.set=set(items)` in `__init__()`. – Elazar Dec 05 '13 at 19:35
  • 1
    Looks cleaner than the implementation provided in [docs](https://docs.python.org/2/library/heapq.html#priority-queue-implementation-notes). – alecxe Aug 27 '15 at 21:44
  • 2
    @alecxe i think that's because this doesn't support `decrease-key`, and `decrease-key` is the reason for the whole "REMOVED" concept in those docs (and it's that "REMOVED" logic which makes the functions look less clean) – tscizzle Oct 22 '16 at 21:13
13

You can use heapq for non-integer elements (tuples):

import heapq

heap = []
data = [(10,"ten"), (3,"three"), (5,"five"), (7,"seven"), (9, "nine"), (2,"two")]
for item in data:
    heapq.heappush(heap, item)
sorted_data = []
while heap:
    sorted_data.append(heapq.heappop(heap))
print(sorted_data)
data.sort()
print(data == sorted_data)

This will be significantly faster than the queue.PriorityQueue option recommended in the top answer, and unlike queue.PriorityQueue, heapq won't hang forever if you try to pop from an empty heap.

user2357112
  • 260,549
  • 28
  • 431
  • 505
Adrian
  • 141
  • 1
  • 2
  • 1
    This is a good method, when using it, it's useful to add a dummy counter (always increasing each call to `heappush` as the 2nd element of the tuple so that when two entries have the same priority (meaning the first element of the tuple is equal), the tuples are sorted according to the order they were added. This gives the expected result for a priority queue in this edge case. – David Parks Jul 04 '19 at 01:22
7

Did you look at the "Show Source" link on the heapq page? There's an example a little less than halfway down of using a heap with a list of (int, char) tuples as a priority queue.

Harper Shelby
  • 16,475
  • 2
  • 44
  • 51
7

I've not used it, but you could try PyHeap. It's written in C so hopefully it is fast enough for you.

Are you positive heapq/PriorityQueue won't be fast enough? It might be worth going with one of them to start, and then profiling to see if it really is your performance bottlneck.

Kiv
  • 31,940
  • 6
  • 44
  • 59
4

I am implementing a priority queue in python 3 using queue.PriorityQueue like this-

from queue import PriorityQueue

class PqElement(object):
    def __init__(self, value: int):
        self.val = value

    #Custom Compare Function (less than or equsal)
    def __lt__(self, other):
        """self < obj."""
        return self.val > other.val

    #Print each element function
    def __repr__(self):
        return f'PQE:{self.val}'

#Usage-
pq = PriorityQueue()
pq.put(PqElement(v))       #Add Item      - O(Log(n))
topValue = pq.get()        #Pop top item  - O(1)
topValue = pq.queue[0].val #Get top value - O(1)
Abrar Jahin
  • 13,970
  • 24
  • 112
  • 161
  • seems to be a max queue not min queue? `self.val > other.val` If not then can you explain the polarity here? thx – WestCoastProjects Oct 29 '21 at 00:29
  • You are correct @WestCoastProjects, I have given my implement so that anyone can modify it as they need. If you change the `__lt__` function from `self.val > other.val` to `self.val < other.val`, then it would be min queue. – Abrar Jahin Oct 29 '21 at 03:13
2

This is efficient and works for strings or any type input as well -:)

import itertools
from heapq import heappush, heappop

pq = []                         # list of entries arranged in a heap
entry_finder = {}               # mapping of tasks to entries
REMOVED = '<removed-task>'      # placeholder for a removed task
counter = itertools.count()     # unique sequence count

def add_task(task, priority=0):
    'Add a new task or update the priority of an existing task'
    if task in entry_finder:
        remove_task(task)
    count = next(counter)
    entry = [priority, count, task]
    entry_finder[task] = entry
    heappush(pq, entry)

def remove_task(task):
    'Mark an existing task as REMOVED.  Raise KeyError if not found.'
    entry = entry_finder.pop(task)
    entry[-1] = REMOVED

def pop_task():
    'Remove and return the lowest priority task. Raise KeyError if empty.'
    while pq:
        priority, count, task = heappop(pq)
        if task is not REMOVED:
            del entry_finder[task]
            return task
    raise KeyError('pop from an empty priority queue')

Reference: http://docs.python.org/library/heapq.html

Saurabh
  • 5,176
  • 4
  • 32
  • 46
codersofthedark
  • 9,183
  • 8
  • 45
  • 70
1

I've got a priority queue / fibonacci heap at https://pypi.python.org/pypi/fibonacci-heap-mod

It's not fast (large constant c on delete-min, which is O(c*logn)). But find-min, insert, decrease-key and merge are all O(1) - IOW, it's lazy.

If it's too slow on CPython, you might try Pypy, Nuitka or even CPython+Numba :)

user1277476
  • 2,871
  • 12
  • 10
1

A simple implement:

since PriorityQueue is lower first.

from queue import PriorityQueue


class PriorityQueueWithKey(PriorityQueue):
    def __init__(self, key=None, maxsize=0):
        super().__init__(maxsize)
        self.key = key

    def put(self, item):
        if self.key is None:
            super().put((item, item))
        else:
            super().put((self.key(item), item))

    def get(self):
        return super().get(self.queue)[1]


a = PriorityQueueWithKey(abs)
a.put(-4)
a.put(-3)
print(*a.queue)
AsukaMinato
  • 1,017
  • 12
  • 21
0

I can either use a (priority, object) as Charlie Martin suggests, or just implement __cmp__ for my object.

If you want inserted objects to be prioritized by a specific rule, I found it very helpful to write a simple subclass of PriorityQueue which accepts a key-function. You won't have to insert (priority, object) tuples manually and the handling feels more natural.

Demo of the desired behavior:

>>> h = KeyHeap(sum)
>>> h.put([-1,1])
>>> h.put((-1,-2,-3))
>>> h.put({100})
>>> h.put([1,2,3])
>>> h.get()
(-1, -2, -3)
>>> h.get()
[-1, 1]
>>> h.get()
[1, 2, 3]
>>> h.get()
set([100])
>>> h.empty()
True
>>>
>>> k = KeyHeap(len)
>>> k.put('hello')
>>> k.put('stackoverflow')
>>> k.put('!')
>>> k.get()
'!'
>>> k.get()
'hello'
>>> k.get()
'stackoverflow'

Python 2 code

from Queue import PriorityQueue

class KeyHeap(PriorityQueue):
    def __init__(self, key, maxsize=0):            
        PriorityQueue.__init__(self, maxsize)
        self.key = key

    def put(self, x):
        PriorityQueue.put(self, (self.key(x), x))

    def get(self):
        return PriorityQueue.get(self)[1]

Python 3 code

from queue import PriorityQueue

class KeyHeap(PriorityQueue):
    def __init__(self, key, maxsize=0):            
        super().__init__(maxsize)
        self.key = key

    def put(self, x):
        super().put((self.key(x), x))

    def get(self):
        return super().get()[1]

Obviously, calling put will (and should!) raise an error if you try to insert an object which your key-function cannot process.

timgeb
  • 76,762
  • 20
  • 123
  • 145
0

If you want to keep an entire list ordered, not just the top value, I've used some variation of this code in multiple projects, it's a drop in replacement for the standard list class with a similar api:

import bisect

class OrderedList(list):
    """Keep a list sorted as you append or extend it

    An ordered list, this sorts items from smallest to largest using key, so
    if you want MaxQueue like functionality use negative values: .pop(-1) and
    if you want MinQueue like functionality use positive values: .pop(0)
    """
    def __init__(self, iterable=None, key=None):
        if key:
            self.key = key
        self._keys = []
        super(OrderedList, self).__init__()
        if iterable:
            for x in iterable:
                self.append(x)

    def key(self, x):
        return x

    def append(self, x):
        k = self.key(x)
        # https://docs.python.org/3/library/bisect.html#bisect.bisect_right
        i = bisect.bisect_right(self._keys, k)
        if i is None:
            super(OrderedList, self).append((self.key(x), x))
            self._keys.append(k)
        else:
            super(OrderedList, self).insert(i, (self.key(x), x))
            self._keys.insert(i, k)

    def extend(self, iterable):
        for x in iterable:
            self.append(x)

    def remove(self, x):
        k = self.key(x)
        self._keys.remove(k)
        super(OrderedList, self).remove((k, x))

    def pop(self, i=-1):
        self._keys.pop(i)
        return super(OrderedList, self).pop(i)[-1]

    def clear(self):
        super(OrderedList, self).clear()
        self._keys.clear()

    def __iter__(self):
        for x in super(OrderedList, self).__iter__():
            yield x[-1]

    def __getitem__(self, i):
        return super(OrderedList, self).__getitem__(i)[-1]

    def insert(self, i, x):
        raise NotImplementedError()
    def __setitem__(self, x):
        raise NotImplementedError()
    def reverse(self):
        raise NotImplementedError()
    def sort(self):
        raise NotImplementedError()

It can handle tuples like (priority, value) by default but you can also customize it like this:

class Val(object):
    def __init__(self, priority, val):
        self.priority = priority
        self.val = val

h = OrderedList(key=lambda x: x.priority)

h.append(Val(100, "foo"))
h.append(Val(10, "bar"))
h.append(Val(200, "che"))

print(h[0].val) # "bar"
print(h[-1].val) # "che"
Jaymon
  • 5,363
  • 3
  • 34
  • 34
0

If you only have a single "higher priority" level rather than arbitrarily many as supported by queue.PriorityQueue, you can efficiently use a collections.deque for this by inserting normal jobs at the left .appendleft(), and inserting your higher-priority entries at the right .append()

Both queue and deque instances have threadsafe push/pop methods

Misc advantages to Deques

  • allows peeking arbitrary elements (indexable and iterable without popping, while queue instances can only be popped)
  • significantly faster than queue.PriorityQueue (see sketchy testing below)

Cautions about length limitations

  • setting a length will let it push elements out of either end, not just off the left, unlike queue instances, which block or raise queue.Full
  • any unbounded collection will eventually run your system out of memory if input rate exceeds consumption
import threading
from collections import deque as Deque

Q = Deque()  # don't set a maximum length

def worker_queue_creator(q):
    sleepE = threading.Event()  # use wait method for sleeping thread
    sleepE.wait(timeout=1)

    for index in range(3):  # start with a few jobs
        Q.appendleft("low job {}".format(index))

    Q.append("high job 1")  # add an important job

    for index in range(3, 3+3):  # add a few more jobs
        Q.appendleft("low job {}".format(index))

    # one more important job before ending worker
    sleepE.wait(timeout=2)
    Q.append("high job 2")

    # wait while the consumer worker processes these before exiting
    sleepE.wait(timeout=5)

def worker_queue_consumer(q):
    """ daemon thread which consumes queue forever """
    sleepE = threading.Event()  # use wait method for sleeping thread
    sleepE.wait(timeout=1)  # wait a moment to mock startup
    while True:
        try:
            pre_q_str = str(q)  # see what the Deque looks like before before pop
            job = q.pop()
        except IndexError:  # Deque is empty
            pass            # keep trying forever
        else:  # successfully popped job
            print("{}: {}".format(job, pre_q_str))
        sleepE.wait(timeout=0.4)  # quickly consume jobs


# create threads to consume and display the queue
T = [
    threading.Thread(target=worker_queue_creator, args=(Q,)),
    threading.Thread(target=worker_queue_consumer, args=(Q,), daemon=True),
]

for t in T:
    t.start()

T[0].join()  # wait on sleep in worker_queue_creator to quit
% python3 deque_as_priorityqueue.py
high job 1: deque(['low job 5', 'low job 4', 'low job 3', 'low job 2', 'low job 1', 'low job 0', 'high job 1'])
low job 0: deque(['low job 5', 'low job 4', 'low job 3', 'low job 2', 'low job 1', 'low job 0'])
low job 1: deque(['low job 5', 'low job 4', 'low job 3', 'low job 2', 'low job 1'])
low job 2: deque(['low job 5', 'low job 4', 'low job 3', 'low job 2'])
low job 3: deque(['low job 5', 'low job 4', 'low job 3'])
high job 2: deque(['low job 5', 'low job 4', 'high job 2'])
low job 4: deque(['low job 5', 'low job 4'])
low job 5: deque(['low job 5'])

Comparison

import timeit

NUMBER = 1000

values_builder = """
low_priority_values  = [(1, "low-{}".format(index)) for index in range(5000)]
high_priority_values = [(0, "high-{}".format(index)) for index in range(1000)]
"""

deque_setup = """
from collections import deque as Deque
Q = Deque()
"""
deque_logic_input = """
for item in low_priority_values:
    Q.appendleft(item[1])  # index into tuples to remove priority
for item in high_priority_values:
    Q.append(item[1])
"""
deque_logic_output = """
while True:
    try:
        v = Q.pop()
    except IndexError:
        break
"""

queue_setup = """
from queue import PriorityQueue
from queue import Empty
Q = PriorityQueue()
"""
queue_logic_input = """
for item in low_priority_values:
    Q.put(item)
for item in high_priority_values:
    Q.put(item)
"""

queue_logic_output = """
while True:
    try:
        v = Q.get_nowait()
    except Empty:
        break
"""

# abuse string catenation to build the setup blocks
results_dict = {
    "deque input":  timeit.timeit(deque_logic_input, setup=deque_setup+values_builder, number=NUMBER),
    "queue input":  timeit.timeit(queue_logic_input, setup=queue_setup+values_builder, number=NUMBER),
    "deque output": timeit.timeit(deque_logic_output, setup=deque_setup+values_builder+deque_logic_input, number=NUMBER),
    "queue output": timeit.timeit(queue_logic_output, setup=queue_setup+values_builder+queue_logic_input, number=NUMBER),
}

for k, v in results_dict.items():
    print("{}: {}".format(k, v))

Results (6000 elements pushed and popped, timeit number=1000)

% python3 deque_priorityqueue_compare.py
deque input: 0.853059
queue input: 24.504084000000002
deque output: 0.0013576999999997952
queue output: 0.02025689999999969

While this is a fabricated example to show off deque's performance, PriorityQueue's insert time is some significant function of its length and O(log n) or worse, while a Deque is O(1), so it should be fairly representative of a real use case

ti7
  • 16,375
  • 6
  • 40
  • 68