I have been implementing my own heap module to help me understand the heap data structure. I understand how they work and are managed, but my implementation is significantly slower than the standard python heapq module while preforming a heap sort (for list of size 100,000, heapq takes 0.6s while my code takes 2s (was originally 2.6s, cut it down to 2s by taking len() statements out of percDown and passing through the length so it doesn't have to calcuate len every time the method calls itself). Here is my implementation:
def percDown(lst, start, end, node):
#Moves given node down the heap, starting at index start, until the heap property is
#satisfied (all children must be larger than their parent)
iChild = 2 * start + 1
i = start
# if the node has reached the end of the heap (i.e. no children left),
# return its index (we are done)
if iChild > end - 1:
return start
#if the second child exists and is smaller than the first child, use that child index
#for comparing later
if iChild + 1 < end and lst[iChild + 1] < lst[iChild]:
iChild += 1
#if the smallest child is less than the node, it is the new parent
if lst[iChild] < node:
#move the child to the parent position
lst[start] = lst[iChild]
#continue recursively going through the child nodes of the
# new parent node to find where node is meant to go
i = percDown(lst, iChild, end, node)
return i
popMin: pops the minimum value (lst[0]) and reorders the heap
def popMin(lst):
length = len(lst)
if (length > 1):
min = lst[0]
ele = lst.pop()
i = percDown(lst, 0, length - 1, ele)
lst[i] = ele
return min
else:
return lst.pop()
heapify: turns a list into a heap in-place
def heapify(lst):
iLastParent = math.floor((len(lst) - 1) / 2)
length = len(lst)
while iLastParent >= 0:
ele = lst[iLastParent]
i = percDown(lst, iLastParent, length, lst[iLastParent])
lst[i] = ele
iLastParent -= 1
sort: heapsorts the given list using the methods above (not in-place)
def sort(lst):
result = []
heap.heapify(lst)
length = len(lst)
for z in range(0, length):
result.append(heap.popMin(lst))
return result
Did I mistakenly add complexity to the algorithm/heap creation, or is it just the python heapq module being heavily optimized? I have a feeling it is the former, as 0.6s vs 2s is a huge difference.