heapq custom compareTo

Question

I'm trying to define a custome method to make an ordered insert into a priority queue in python but not getting the expected results. Once defined the insert method into the queue like the following:

def insert(self, node):
    if isinstance(node, treeNode):
        heapq.heappush(self._frontier, (node._f, node))
    else:
        print("Error. It is not a node.")

And implementing the following lt in the 'node' class:

def __lt__(self, other):
    return self._f < other._f

The insertion is not done by the 'f' attribute value which is what I wanna do, insert in an ascendant order determined by that value. Any help will be much appreciated.

Ex of failure:

[(141.09530289033592, <treeNode.treeNode object at 0x7f08bb840fd0>), (484.8315227978057, <treeNode.treeNode object at 0x7f08bb849da0>), (390.0514031446352, <treeNode.treeNode object at 0x7f08bb840e48>)]

It only puts in the first position de lowest value, which does make sense since using a priority queue but then the following ones are not sorted by the custom method I wanna declare.

a random float generated when creating an instance of node class — jupcan, Oct 26 '18 at 18:16
yes, it is but the insert method is defined in the frontier class — jupcan, Oct 26 '18 at 18:23
`heapq` does **not** sort anything. It creates a heap. Since `141 < 484` and `141 < 390` you have a valid min-heap. If you want to insert an element to keep the list sorted you want to use the `bisect` module. — Bakuriu, Oct 26 '18 at 18:37
yeah but i think there is a way to obtain a ordered insertion in a heapq by overriding the __lt__ method python uses by default and that's what im trying to achieve — jupcan, Oct 26 '18 at 18:39
No there isn't. The order of the elements in the heap depends on the order of insertions. That is the difference between using `bisect` and `heap`. I don't get why you want to use a heap. `heappush` is O(logn) exactly like `bisect` functions but it does not give the guarantee you want. What gives? Heaps are more efficient if you first have the full list, then call `heapify` to obtain the heap in O(n) time which obviously need not be sorted (otherwise you have an O(n) sorting algorithm). — Bakuriu, Oct 26 '18 at 18:45
If you want the elements sorted the only way is to insert them in increasing order. — Bakuriu, Oct 26 '18 at 18:45
@Bakuriu now I see, thanks for explaining. The reason why trying to use heap is because of the insertion time and total number of elements I can insert, thought there was a way to use it along with and overrided method to make an ordered inser similar to the compareTo java has for priotiy queues — jupcan, Oct 26 '18 at 18:50
I'm trying to see what is the most optimal data structure tu use to get the frontier ordered in real time, no by calling a sort method of the list but I guess bisect is the way to go then — jupcan, Oct 26 '18 at 18:51
You are welcome. Choosing the correct data structure for your problem is one of the skills you should learn. The choice can make a huge impact on the performance of an algorithm (or even its design). Sometimes we get clever ideas about this. Most of the time they are correct. Sometimes we forget a little details that can break them. So it's important to check that the data structure 1) has the properties you want and 2) has the space-time complexities you expect. — Bakuriu, Oct 26 '18 at 19:11
@Bakuriu one last thing, do you know why doing a bisect.insort(self._frontier, (node._f, node)) complexity is exponential? getting lots of objects in a few seconds, but then not that many over time. since it is a real time order insert it should be more or less linear right? — jupcan, Oct 26 '18 at 19:57
Uhm, that should not happen. However indeed right now trying to run with `python3 -m timeit -s 'import bisect as B; import random as R;seq=[]' 'for _ in range(100000):B.insort(seq, R.randint(0, 1000000))'` I see that with 10k insertions is all fine (80ms and up to that point it basically scales linearly [keep in mind that it is O(nlog n) so it's a little bit worse than linear]) but with 100k it takes forever instead of 10 times more. This could make an interesting second question. A list of 100k elements isn't really that big and log(100k) is 16 so it's not that big. — Bakuriu, Oct 26 '18 at 20:35
@Bakuriu so I guess it is not a problem on my end right? I should use millions of objects so that's why maybe I'm getting those times as you did with the 100k list but not sure which data structure or way to go is the best atm :/ — jupcan, Oct 26 '18 at 20:50
did you happen to ask a question related to this performance problem? I would be interested in seeing the cause of that. — Bakuriu, Oct 27 '18 at 08:39
@Bakuriu I did not yet, but I will do, also quite interested on it — jupcan, Oct 27 '18 at 13:38
@Bakuriu have just posted it :) https://stackoverflow.com/questions/53023380/bisect-insort-complexity-not-as-expected — jupcan, Oct 27 '18 at 15:20
Yeah I see. In the end it was simple, it's just the time of the allocations. However I find it strange that it increase that much. It seems more than quadratic to me, but maybe it's just that the structures start to not fit in cache etc. so all memory accesses slow down. — Bakuriu, Oct 27 '18 at 18:37
@Bakuriu yes, that could be what happens. at the end it seems like python sorted function its kinda optimal and I thought it wasnt like in other programming languages, good to know — jupcan, Oct 28 '18 at 12:27

Dani Mesejo · Accepted Answer · 2018-10-26T18:47:23.093

1

As @Bakuriu mentioned the heapq only mantains the invariant of the heap, if you want to obtain the elements in order use nsmallest, for instance:

import heapq


class TreeNode:
    def __init__(self, f):
        self.f = f

    def __repr__(self):
        return 'f: {}'.format(self.f)

    def __lt__(self, other):
        return self.f < other.f


class Frontier:
    def __init__(self):
        self.frontier = []

    def insert(self, node):
        if isinstance(node, TreeNode):
            heapq.heappush(self.frontier, (node.f, node))
        else:
            print("Error. It is not a node.")

    def __len__(self):
        return len(self.frontier)


t = Frontier()

for n in [TreeNode(141.09530289033592), TreeNode(484.8315227978057), TreeNode(390.0514031446352)]:
    t.insert(n)

print(heapq.nsmallest(len(t), t.frontier))

Output

[(141.09530289033592, f: 141.09530289033592), (390.0514031446352, f: 390.0514031446352), (484.8315227978057, f: 484.8315227978057)]

edited Oct 26 '18 at 18:47

answered Oct 26 '18 at 18:45

Dani Mesejo

61,499
6
49
76

You don't need to `heapify` an empty list. Anyway: this code would be exactly as efficient using `bisect` instead of `heapq` and it wouldn't require using `nsmallest`. BTW: `nsmallest` calls `heapify` so `t.frontier` need not even be a heap at all. – Bakuriu Oct 26 '18 at 18:47
@Bakuri I updated the code and I agree with you, the efficiency is the same as using bisect. Only posted the way the OP could solve his issue. – Dani Mesejo Oct 26 '18 at 18:50
thanks for your comment. bisect.insort() will be equivalent to it right? – jupcan Oct 26 '18 at 18:56
@jupcan Yes it will. – Dani Mesejo Oct 26 '18 at 18:56
1

@jupcan you could also look at http://www.laurentluce.com/posts/binary-search-tree-library-in-python/ – Dani Mesejo Oct 26 '18 at 19:03

heapq custom compareTo

1 Answers1