140

I am trying to build a heap with a custom sort predicate. Since the values going into it are of "user-defined" type, I cannot modify their built-in comparison predicate.

Is there a way to do something like:

h = heapq.heapify([...], key=my_lt_pred)
h = heapq.heappush(h, key=my_lt_pred)

Or even better, I could wrap the heapq functions in my own container so I don't need to keep passing the predicate.

blackraven
  • 5,284
  • 7
  • 19
  • 45
vsekhar
  • 5,090
  • 5
  • 22
  • 23
  • 1
    possible duplicate of http://stackoverflow.com/questions/679731/min-heap-in-python – Rajendran T Jan 16 '12 at 04:30
  • possible duplicate of [How to make heapq evaluate the heap off of a specific attribute?](http://stackoverflow.com/questions/3954530/how-to-make-heapq-evaluate-the-heap-off-of-a-specific-attribute) – Cristian Ciupitu Aug 31 '15 at 02:47

8 Answers8

163

According to the heapq documentation, the way to customize the heap order is to have each element on the heap to be a tuple, with the first tuple element being one that accepts normal Python comparisons.

The functions in the heapq module are a bit cumbersome (since they are not object-oriented), and always require our heap object (a heapified list) to be explicitly passed as the first parameter. We can kill two birds with one stone by creating a very simple wrapper class that will allow us to specify a key function, and present the heap as an object.

The class below keeps an internal list, where each element is a tuple, the first member of which is a key, calculated at element insertion time using the key parameter, passed at Heap instantiation:

# -*- coding: utf-8 -*-
import heapq

class MyHeap(object):
    def __init__(self, initial=None, key=lambda x:x):
        self.key = key
        self.index = 0
        if initial:
            self._data = [(key(item), i, item) for i, item in enumerate(initial)]
            self.index = len(self._data)
            heapq.heapify(self._data)
        else:
            self._data = []

    def push(self, item):
        heapq.heappush(self._data, (self.key(item), self.index, item))
        self.index += 1

    def pop(self):
        return heapq.heappop(self._data)[2]

(The extra self.index part is to avoid clashes when the evaluated key value is a draw and the stored value is not directly comparable - otherwise heapq could fail with TypeError)

Community
  • 1
  • 1
jsbueno
  • 99,910
  • 10
  • 151
  • 209
  • 5
    Very nice! You could even go further and use triples (self.key(item), id, item), where id could be an integer handled as a class attribute, and incremented after each push. That way, you avoid the exception raised when key(item1) = key(item2). Because keys would be unique. – zeycus Jul 29 '16 at 19:30
  • 7
    I actually tried to push this (or something based on this) into Python's stdlib,and the suggestion got declined. – jsbueno Jul 30 '16 at 01:02
  • 1
    pity, fits the object-oriented style of most Python features, and the key argument provides extra flexibility. – zeycus Jul 31 '16 at 14:47
  • I have used list instead of tuple for e.g. [self.key(item), id, item] and it works just fine as long as first index is key. – Deepak Yadav May 10 '18 at 10:17
  • @DeepakYadav things might go messy, as lists are mutable, and tuples are not. If you'll change the priority (the [0]) the heap will be broken. – Michal Jul 18 '18 at 16:59
  • @jsbueno sorry for a stupid question but I'm new to python and looking at your example, I don't see how the custom predicate is defined. Shouldn't there be some function that defines a strict weak less than ordering? I assume self.key(item) is used to for the predicate is this is where you can define custom behavior but you just returned the value as is in your example, correct? But I don't see how to consume this when you need a custom ordering, like for example if you want to order based on both values of pair you'll need to have a comparison of two pairs and return a true/false for <. – ssj_100 Sep 23 '18 at 19:18
  • The `key` argument should be a callable object (ex. a function), that taking the main item as input, yields another object that is directly comparable by Python with `>` and `<=` in the desired order. The idea is the same of the `key` parameter on the built-in `sorted` method - https://docs.python.org/3/library/functions.html#sorted – jsbueno Sep 23 '18 at 19:42
  • 8
    This would fail if elements are not comparable and there are ties in key values. I'd put `id(item)` as a middle element of the tuple to break ties. – Georgi Yanchev May 09 '19 at 19:54
  • @GeorgiYanchev - I added that now. Thanks for the feedback. – jsbueno Mar 20 '20 at 14:33
148

Define a class, in which override the __lt__() function. See example below (works in Python 3.7):

import heapq

class Node(object):
    def __init__(self, val: int):
        self.val = val

    def __repr__(self):
        return f'Node value: {self.val}'

    def __lt__(self, other):
        return self.val < other.val

heap = [Node(2), Node(0), Node(1), Node(4), Node(2)]
heapq.heapify(heap)
print(heap)  # output: [Node value: 0, Node value: 2, Node value: 1, Node value: 4, Node value: 2]

heapq.heappop(heap)
print(heap)  # output: [Node value: 1, Node value: 2, Node value: 2, Node value: 4]

Fanchen Bao
  • 3,310
  • 1
  • 21
  • 34
  • 20
    This seems like the cleanest solution by far! – Foobar Apr 03 '20 at 02:10
  • 1
    Absolutely agree with the previous two comments. This seems to be a better, cleaner solution for Python 3. – Chiraz BenAbdelkader Jun 30 '20 at 17:37
  • Also, here is very similar solution to a similar question: https://stackoverflow.com/questions/2501457/what-do-i-use-for-a-max-heap-implementation-in-python/40455775?noredirect=1#comment110575517_40455775 – Chiraz BenAbdelkader Jun 30 '20 at 17:41
  • 2
    I tested this using `__gt__` instead and it works as well. Why doesn't it matter which magic method we use? I can't find anything in `heapq`'s documentation. Maybe it's related to how Python does comparisons in general? – Josh Clark Sep 07 '20 at 21:57
  • 12
    When doing a comparison in `heapq`, Python looks for `__lt__()` first. If it is not defined, it will look for `__gt__()`. If neither is defined, it throws `TypeError: '<' not supported between instances of 'Node' and 'Node'`. This can be confirmed by defining both `__lt__()` and `__gt__()`, placing a print statement in each, and having `__lt__()` return `NotImplemented`. – Fanchen Bao Sep 07 '20 at 23:11
  • 1
    To make this solution a complete one, there needs to be a tie breaker. In order to break the tie when "self.val == other.val" in the "__lt__" function, one option is to introduce an other field (priority or something that is pertinent to your business domain) into Node class, so that we could compare this field and make sure there are not equal values regarding this field. – Yiling May 01 '21 at 08:04
29

The heapq documentation suggests that heap elements could be tuples in which the first element is the priority and defines the sort order.

More pertinent to your question, however, is that the documentation includes a discussion with sample code of how one could implement their own heapq wrapper functions to deal with the problems of sort stability and elements with equal priority (among other issues).

In a nutshell, their solution is to have each element in the heapq be a triple with the priority, an entry count and the element to be inserted. The entry count ensures that elements with the same priority a sorted in the order they were added to the heapq.

srgerg
  • 18,719
  • 4
  • 57
  • 39
  • This is the correct solution, both heappush and heappushpop works directly with tuples – daisy May 05 '17 at 04:52
  • 1
    this solution is clean but can not cover all custom algorithm, for example, a max heap of string. – PaleNeutron May 20 '21 at 03:41
  • i really don't understand how people find submitting an entry count a clean solution. gods of python hear my prayers and make a normal priority queue class. thanks! – aleksandarbos Feb 08 '23 at 11:42
22
setattr(ListNode, "__lt__", lambda self, other: self.val <= other.val)

Use this for comparing values of objects in heapq

Ritika Gupta
  • 493
  • 4
  • 14
3

The limitation with both answers is that they don't allow ties to be treated as ties. In the first, ties are broken by comparing items, in the second by comparing input order. It is faster to just let ties be ties, and if there are a lot of them it could make a big difference. Based on the above and on the docs, it is not clear if this can be achieved in heapq. It does seem strange that heapq does not accept a key, while functions derived from it in the same module do.
P.S.: If you follow the link in the first comment ("possible duplicate...") there is another suggestion of defining le which seems like a solution.

bbphd
  • 31
  • 2
1

In python3, you can use cmp_to_key from functools module. cpython source code.

Suppose you need a priority queue of triplets and specify the priority use the last attribute.

from heapq import *
from functools import cmp_to_key
def mycmp(triplet_left, triplet_right):
    key_l, key_r = triplet_left[2], triplet_right[2]
    if key_l > key_r:
        return -1  # larger first
    elif key_l == key_r:
        return 0  # equal
    else:
        return 1


WrapperCls = cmp_to_key(mycmp)
pq = []
myobj = tuple(1, 2, "anystring")
# to push an object myobj into pq
heappush(pq, WrapperCls(myobj))
# to get the heap top use the `obj` attribute
inner = pq[0].obj

Performance Test:

Environment

python 3.10.2

Code

from functools import cmp_to_key
from timeit import default_timer as time
from random import randint
from heapq import *

class WrapperCls1:
    __slots__ = 'obj'
    def __init__(self, obj):
        self.obj = obj
    def __lt__(self, other):
        kl, kr = self.obj[2], other.obj[2]
        return True if kl > kr else False

def cmp_class2(obj1, obj2):
    kl, kr = obj1[2], obj2[2]
    return -1 if kl > kr else 0 if kl == kr else 1

WrapperCls2 = cmp_to_key(cmp_class2)

triplets = [[randint(-1000000, 1000000) for _ in range(3)] for _ in range(100000)]
# tuple_triplets = [tuple(randint(-1000000, 1000000) for _ in range(3)) for _ in range(100000)]

def test_cls1():
    pq = []
    for triplet in triplets:
        heappush(pq, WrapperCls1(triplet))
        
def test_cls2():
    pq = []
    for triplet in triplets:
        heappush(pq, WrapperCls2(triplet))

def test_cls3():
    pq = []
    for triplet in triplets:
        heappush(pq, (-triplet[2], triplet))

start = time()
for _ in range(10):
    test_cls1()
    # test_cls2()
    # test_cls3()
print("total running time (seconds): ", -start+(start:=time()))

Results

use list instead of tuple, per function:

  • WrapperCls1: 16.2ms
  • WrapperCls1 with __slots__: 9.8ms
  • WrapperCls2: 8.6ms
  • move the priority attribute into the first position ( don't support custom predicate ): 6.0ms.

Therefore, this method is slightly faster than using a custom class with an overridden __lt__() function and the __slots__ attribute.

Voyager
  • 727
  • 8
  • 26
0

Simple and Recent

A simple solution is to store entries as a list of tuples for each tuple define the priority in your desired order if you need a different order for each item within the tuple just make it the negative for descending order.

See the official heapq python documentation in this topic Priority Queue Implementation Notes

csmaster
  • 579
  • 4
  • 14
0

Simple little trick:

Say you have this list of (name,age) as

a = [('Tim',4), ('Radha',9), ('Rob',7), ('Krsna',3)]

And you want to sort this list based on their ageby adding them to a min-heap, instead of writing all the custom comparator stuff, you can just flip the order of the contents of the tuple just before pushing it to the queue. This is because heapq.heappush() sorts by the first element of the tuple by default. Like this:

import heapq
heap = []
heapq.heapify(heap)
for element in a:
    heapq.heappush(heap, (element[1],element[0]))

This is a simple trick if this does your job and you don't want to get into writing the custom comparator mess.

Similarly it sorts the values in ascending order by default. If you want to sort in descending order of age, flip the contents and make the value of the first element of the tuple a negative:

import heapq
heap = []
heapq.heapify(heap)
for element in a:
    heapq.heappush(heap, (-element[1],element[0]))
plamsal
  • 11
  • 2