0

So I find myself taking advantage of heapq for some calculations. However, for the problem I am working on, it runs slow because the heap gets pretty big.

I thought I had an option to speed it up. Rather than creating a giant heap, make it a heap of heaps. But, surprisingly to me, the "more efficient" code is significantly slower. There's a bit more overhead in that more efficient code, but I really thought it would win by a lot. Having stripped down the problem, I've got two functions that do the same net calculation. f1 is the "naive" (and faster) version. f2 is the "improved" (but slower) version. I do some random number generation in both, but I use the same seed, so it really is the same thing.

import random
import heapq
def f1():
    random.seed(1)
    Q=[0]
    while Q:
        value = heapq.heappop(Q)
        #print value
        if value<0.5:
            for counter in range(16):
                heapq.heappush(Q,value + 0.1 + (random.random()/1000))
    print value

def f2():
    random.seed(1)
    Q=[[0]]
    while Q:
        subQ = heapq.heappop(Q)
        value = heapq.heappop(subQ)
        #print value
        if subQ:
            heapq.heappush(Q,subQ)
        newQ = []
        if value<0.5:
            for counter in range(16):
                newQ.append(value + 0.1 + (random.random()/1000))
            heapq.heapify(newQ)
            heapq.heappush(Q,newQ)
    print value

Why does the heap of heaps (f2) run significantly slower? It should call heappush the same number of times, and heappop twice as many times. But the size of the heaps should be significantly smaller, so I expected it to run faster.

Joel
  • 22,598
  • 6
  • 69
  • 93
  • write big-O cost for each operation and compare e.g., heapify(newQ) is O(len(newQ)), heappop(), heappush() are O(log len(Q) * comparison cost) where comparison cost is O(1) for a float and O(len(newQ)) for nested heaps. – jfs Oct 03 '15 at 01:10
  • @J.F.Sebastian Why would a comparison of two lists be O(len(newQ))? I would expect it to just look at the first entry in the list and sort by that --- I expected it to be O(1). (in all the comparisons the lists differ in their first entry) – Joel Oct 03 '15 at 01:53
  • if they *always* differ then it is also O(1) and only the constant factor might be different. Try to modify algorithm to encrease the heap size, to see the actual growth polynom ([example](http://stackoverflow.com/a/482848/4279)). – jfs Oct 03 '15 at 02:01
  • @J.F.Sebastian - so it actually does behave the way I was expecting. But I needed to make the subQs to be really large before the benefit I was after materialized. thanks. – Joel Oct 03 '15 at 02:30

1 Answers1

0

So I just wasn't pushing the code hard enough. Here's some modified code. When the subQ is made very large, the benefit I was after appears.

def f1(m,n):
    random.seed(1)
    Q=[0]
    for counter in range(m):
        value = heapq.heappop(Q)
        #print value
        for newcounter in range(n):
            heapq.heappush(Q,random.random())
    print value #should be the same for both methods, so this is just a test
    
def f2(m,n):
    random.seed(1)
    Q=[[0]]
    for counter in range(1000000):
        subQ = heapq.heappop(Q)
        value = heapq.heappop(subQ)
        #print value
        if subQ:
            heapq.heappush(Q,subQ)
        newQ = []
        for newcounter in range(n):
            newQ.append(random.random())
        heapq.heapify(newQ)
        heapq.heappush(Q,newQ)
    print value #should be the same for both methods, so this is just a test

When I profile f1(1000000,10) and f2(1000000,10) I get run times of 10.7 seconds and 14.8 seconds. The relevant details are:

f1:

ncalls tottime percall cumtime percall filename:lineno(function)

1000000 1.793 0.000 1.793 0.000 {_heapq.heappop}

10000000 3.856 0.000 3.856 0.000 {_heapq.heappush}

f2:

1000000 1.095 0.000 1.095 0.000 {_heapq.heapify}

2000000 2.628 0.000 2.628 0.000 {_heapq.heappop}

1999999 2.245 0.000 2.245 0.000 {_heapq.heappush}

10000000 1.114 0.000 1.114 0.000 {method 'append' of 'list' objects}

So net f2 loses out because of the extra heappop, as well as heapify and append. It does better on heappush.

But when I go and challenge it with a larger internal loop and run f1(1000,100000) and f2(1000,100000) we get

f1:

1000 0.015 0.000 0.015 0.000 {_heapq.heappop}

100000000 28.730 0.000 28.730 0.000 {_heapq.heappush}

f2:

1000 19.952 0.020 19.952 0.020 {_heapq.heapify}

2000 0.011 0.000 0.011 0.000 {_heapq.heappop}

1999 0.006 0.000 0.006 0.000 {_heapq.heappush}

100000000 6.977 0.000 6.977 0.000 {method 'append' of 'list' objects}

So we now do much better on heappush, and it's now enough that f2 runs faster (69 seconds versus 75 seconds).

So it turns out I just hadn't pushed my code sufficiently hard. I needed things to get large enough that many calls to heappush became slower than many calls to heapify.

Community
  • 1
  • 1
Joel
  • 22,598
  • 6
  • 69
  • 93
  • why do you use `range(1000000)` in `f2()` but `range(m)` in `f1()`? – jfs Oct 03 '15 at 02:49
  • If `m`,`n` are close then `f2()` should be worse by `O(len(newQ) / log(len(Q)))` -- you should recheck it. – jfs Oct 03 '15 at 03:03
  • Optimization for your `f2` code: Change `subQ = heapq.heappop(Q)`, `value = heapq.heappop(subQ)`, `if subQ: heapq.heappush(Q,subQ)` to: `subQ = Q[0]; value = heapq.heappop(subQ)`, `if subQ: heapq.heapreplace(Q, subQ)`, `else: heapq.heappop(Q)`. Since the common case is that you're not clearing the head of `Q`, it's more efficient to use `heapreplace` to restore the heap invariant (definitely legal too; `heapq.merge` uses the same trick of mutating the head and `heapreplace`-ing it) instead of `heappop`-ing when you're usually going to `heappush` it back on. Savings are small but consistent. – ShadowRanger Oct 03 '15 at 04:53
  • You could also move the replace or pop down, so when you have `newQ` you either replace and push, or just replace (saving an unnecessary pop/push in the emptied heap case). Of course, given how much the code is being run, I see significant savings just caching modules functions into the local namespace before starting the loop so it's direct from local stack instead of global lookup of `heapq`, then lookup of `heapq` attribute. – ShadowRanger Oct 03 '15 at 04:56
  • @J.F.Sebastian why `range(1000000)` --- failure to fix bug found after copy/paste... – Joel Oct 03 '15 at 06:16
  • @ShadowRanger Thanks - that looks useful. – Joel Oct 03 '15 at 06:17
  • @Joel: does it invalidate the results? – jfs Oct 03 '15 at 06:44
  • @J.F.Sebastian - no the typo was corrected before I did the calculations - I just hadn't corrected what I had pasted. Anyways - thanks for the help. I'm boarding a transpacific flight now... – Joel Oct 03 '15 at 07:06