Does heapq.heappush() compare on int AND string without been specified for?

Question

I was checking out this solution to a question on leetcode.com

def topKFrequent(self, words, k):
        count = collections.Counter(words)
        heap = [(-freq, word) for word, freq in count.items()]
        heapq.heapify(heap)
        return [heapq.heappop(heap)[1] for _ in xrange(k)]

and when I provide it an array of strings like ["aa", "aaa", "a"] and 1 it correctly returns ["a"]. My question is did the heap also lexographically sort the tuples internally? Because according to me if there was no sorting, it would have simply returned ["aa"] (the order in which the heap was constructed since counts of all three are the same). Or have I misunderstood heapq?

Sorted, no. *Ordered*, yes. (Though the chosen ordering may coincidentally be a sorted order.) — chepner, Feb 19 '20 at 17:22

score 3 · Answer 1 · answered Feb 19 '20 at 17:14

You have a heap of integer/string pairs, and so it is ordered based on the definition of < for tuples, which takes into account both elements of each type.

Given ["aa", "aaa", "a"], count.items() is sequence of tuples [('aa', 1), ('aaa', 1), ('a', 1)]. You then build a heap using the list of tuples

[(-1, 'aa'), (-1, 'aaa'), (-1, 'a')]

Since the first element of each tuple is the same, the comparisons are determined solely by the second, string, element.

brandizzi · Accepted Answer · 2020-11-23T13:09:05.993

heapq just compares values from the queue using using the "less than" operator ^[1] regardless of the type of the value. It is the type of the value that defines what the comparison will return. So, what makes the difference here is the tuple itself. As from the documentation:

The comparison [of sequence objects] uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted.

Checking some examples:

>>> (0, 'a') < (1, 'aa')
True
>>> (1, 'a') < (1, 'aa')
True
>>> (1, 'aa') < (1, 'a')
False
>>> (2, 'a') < (1, 'aa')
False

So you are right, the values are ordered lexicographically and the second value of the tuple is relevant. However, heapq does not have to do anything here to get this result, the mere tuple comparison does that.

^[1] One can check it in the code. Here is one of the lines where the comparison is made by heapq (in C):

cmp = PyObject_RichCompareBool(newitem, parent, Py_LT);

This PyObject_RichCompareBool() is, according to the documentation:

the equivalent of the Python expression o1 op o2, where op is the operator corresponding to opid.

More on tuple comparisons [here](https://stackoverflow.com/questions/5292303/how-does-tuple-comparison-work-in-python). — brandizzi, Feb 19 '20 at 18:13
Great answer. It's a pity that the API docs themselves don't shed much light on how tuples in a heapq are actually compared. — Holger Brandl, Nov 22 '20 at 09:51

score 0 · Answer 3 · answered Feb 19 '20 at 17:14

0

Heaps are partial orderings. They are not sorted. You can, however, build sorts out of them by storing values in a heap and pulling them out one at a time. These sorts are not stable, because heaps do not try to preserve the ordering of "equal" values.

Here's another kind of Python heap you might be interested in: https://pypi.org/project/fibonacci-heap-mod/

answered Feb 19 '20 at 17:14

dstromberg

6,954
1
26
27

Though it's worth pointing out that heaps obey the [heap property](https://xlinux.nist.gov/dads/HTML/heapproperty.html), which a sorted list satisfies. A heap doesn't *have* to be sorted, but it *can* be. – chepner Feb 19 '20 at 17:20

score 0 · Answer 4 · answered Jun 21 '20 at 17:30

The expectation of the leetcode question is to solve the problem in O(nlogk). So we have to keep only 'k' elements in the heap at any time, which means we have to use "minHeap" (freq, word) and not (-freq,word).

We want 'minHeap' to keep the 'minimum frequency' and 'max lexicographical' value at the top of the heap. That is tricky, because by default it would keep 'minimum frequency' and 'min lex'.

The only solution is to create an object that can have 'freq' and 'word' and override the 'lt' method to do this

def __lt__(self, other):
    if self.c == other.c:
        return self.w > other.w
    return self.c < other.c

How would you use __lt__? – user3303020 May 26 '21 at 04:43 — user3303020, May 26 '21 at 04:43

Does heapq.heappush() compare on int AND string without been specified for?

4 Answers4