0

If I have a list and need to sort it, is there a good reason to use list.sort() over heapq.heapify(list), given that heapify is O(N) (link) and .sort() is O(NlogN)?

Why isn't the default sort algorithm heapify if heapify is faster?

sollyc123
  • 107
  • 1
  • 6
  • 1
    `heapq.heapify` does not sort the list, it turns it into a binary-tree/heap – Iain Shelvington Nov 09 '22 at 18:17
  • 2
    because `heapq.heapify(list)` **does not sort a list**. It *heapifies it* – juanpa.arrivillaga Nov 09 '22 at 18:18
  • 1
    Try `x = [5,4,3,2,1]` and then `heapq.heapify(x)`. – j1-lee Nov 09 '22 at 18:20
  • 1
    Heapsort has to heapify the list, then remove each item at a cost of O(lg n) per item. It's still O(n lg n), but with a much larger constant because it's not optimized like the built-in sort it. – chepner Nov 09 '22 at 18:20
  • 1
    Note, if you *did* want to do a heapsort, the [docs give you an example of how](https://docs.python.org/3/library/heapq.html#basic-examples), which should make it obvious that heapsort is O(N*logN). But there is no good reason to use heap sort as a default. For starters, heap sort is not stable. But also, Python's built-in `sorted`/`list.sort()` is a Timsort, a highly tuned adaptive mergesort that is going to be hard to beat in a general case. – juanpa.arrivillaga Nov 09 '22 at 18:21
  • 1
    Intuitively, heapify builds a binary tree with the heap property, which says that a node is larger (in the case of a maxheap) than both its children. Unlike a binary search tree, where a node and its two children are fully sorted (`left < root < right` is guaranteed), the heap property is satisfied by either `left < right < root` or `right < left < root`. With a binary search tree, it takes O(n lg n) time to build, but then traversing it in sorted order is O(n). With a heap, building it is O(n) but "traversing" it in sorted order is O(n lg n). – chepner Nov 09 '22 at 18:28
  • @juanpa.arrivillaga: heapsort is useful if you will only need the first few sorted elements but you don't know in advance how many you will need. (Think about searching page rank in order until you find a suitable page.) The important thing is understanding the use cases for each algo. (Except bubblesort. Bubblesort's only use case is to confuse impressionable freshers.) – rici Nov 09 '22 at 19:50
  • @rici That's not heapsort; that's just building a heap and taking a constant number of items off the heap. Heapsort is, by definition, removing *all* the items from the heap. – chepner Nov 09 '22 at 19:57
  • @chepner: if I needed a known fixed number, I could use quick-partial-sort, which is essentially the same algorithm as quicksort with some recursions eliminated. But it's difficult to restart. With heapsort, I can start emitting the sorted values before the sort is finished, without disturbing the process. It's not a different algorithm; I could finish the sort or not, but I'm in no way changing the algorithm; only observing that partial results are available before the algorithm terminates. Early availability increases parallelism and allows for the possibility of early termination. – rici Nov 09 '22 at 22:02

1 Answers1

1

as guys have said in the comment section, "heapify" rearranges elements of the list to form a heap-oriented binary tree. complete binary trees can be represented using arrays or in other words, any array which is fully filled with comparable values can be logically viewed as a complete binary tree.

by definition from wikipedia, complete binary tree is:

A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes in the last level are as far left as possible.

see an example: a = [5, 3, 8, 2, 4, 6, 10] ---> heapq.heapify(a)

complete binary tree to heap

to form a heap you need to satisfy 2 rules:

  1. form a complete binary tree
  2. sort the tree so that the children of each node are greater than or equal to the parent node for each node in the tree.

that's exactly what heapify does, it rearranges elements in the array to satisfy those 2 conditions.

as you can see in the picture above, just by taking the top element (root) each time, you'll access tree elements in a sorted fashion or in other words you can use heaps to sort your array in n*logn time:

import heapq

a = [5, 3, 8, 2, 4, 6, 10]
    
def sort_with_heap(a):
    """
    sort O(nlogn)
    """
    heapq.heapify(a) # heapify's the list O(n)

    while len(a) > 0: # O(n) times
        yield heapq.heappop(a) # O(logn)


print(list(sort_with_heap(a)))
>> [2, 3, 4, 5, 6, 8, 10]

note: that the example above is for the sake of understanding how heaps can be used for sorting and is not space optimal. to do proper sorting using heaps take a look at heapsort (which works with max-oriented heaps) and doesn't use any additional space than the sorting array itself. i've wrote one myself in here.

aleksandarbos
  • 496
  • 4
  • 12
  • Sort is not usually O(n²), except possibly in introductory computing classes which inexplicably still teach bubblesort. (If your first programming exercise was bubblesort, your second exercise should be forgetting that you ever learned it.) All standard library sort functions are O(n log n). – rici Nov 09 '22 at 19:44
  • agree, made an update to the answer. thanks – aleksandarbos Nov 09 '22 at 19:46
  • 1
    O(n^2) sorting algorithms are still useful in some circumstances, particularly when it is known that the list will not be very long. Insertion sort is O(n^2) but can be implemented with a quite low constant, if you search for the correct insertion point by binary search and then do the insertion with a bulk memory move. – kaya3 Nov 09 '22 at 19:57
  • for sorted data bubble sort is close to the best algorithm. being O(n) – BevynQ Nov 09 '22 at 19:59
  • In fact, Timsort (the algorithm used by the built-in `sorted` function and `list.sort`) *is* insertion sort for small lists. – chepner Nov 09 '22 at 20:00
  • 1
    @BevynQ Insertion sort equals or beats bubble sort in pretty much every case, including already-sorted data. – kaya3 Nov 09 '22 at 20:00
  • @BevynQ Insertion sort is also O(n) for sorted lists. – chepner Nov 09 '22 at 20:00
  • This is not the forum for discussing the merits of each algorithm but there are use cases for bubble sort, pretty sure the doom engine used it. And teaching it in an introduction to algorithms is not inexplicable. Point I was trying to make is just using timsort or merge sort or any other sort depends on the actual situation and there is no one best sort. – BevynQ Nov 10 '22 at 19:09