0

Merging k sorted lists containing a total of n elements using heapq.merge is supposed to have a complexity of O(n * logk). This is not directly listed in the documentation but can be inferred from the Wikipedia article mentioning the direct k-way merge. It also seems fairly intuitive - you create a heap of the k top elements and then pop the root of that heap and push onto the heap the next element from the same list - and repeat this till you get the heap (and the lists feeding to it) empty.

What bugs me is that the complexity of this algorithm is higher than that of heapq.heapify if the latter is applied on the same number of elements n supplied in a single unsorted list. The latter complexity is known to be O(n)

This does not make sense - it should be the other way round. It should be more difficult to heapify n unordered elements than to heapify the same elements as sorted in k lists.

What am I missing here?

Nick
  • 2,924
  • 4
  • 36
  • 43
  • Could you provide the source in which you found these complexities listed? – Berthur Feb 08 '23 at 16:21
  • @Berthur Thanks - just edited my post to add references. – Nick Feb 08 '23 at 17:19
  • Thank you, that provides a bit more clarity as to what you are confused about. I attempted an answer. Did I understand correctly where your confusion lies? – Berthur Feb 08 '23 at 19:20

1 Answers1

2

Direct k-way merge produces a sorted array from your input of sorted arrays.

Creating a heap from all your n elements in unsorted order produces, well, a heap.

A heap is not a sorted list; in fact you need to do a lot of work to produce a sorted list from a heap, as discussed in articles about heapsort, which is an O(n log n) sorting algorithm. So creating the heap may be O(n), but the output is different to that of k-way merge. In this context, you may view it as weaker than the already sorted array. Thus, it makes sense that this time complexity is smaller than that of k-way merge.

Berthur
  • 4,300
  • 2
  • 14
  • 28
  • Thanks, this makes sense. It seems that heapq.merge can still have a lower complexity than heapq.heapify - but is not clear what this will be. It is probably best for me to post this as a separate question. – Nick Feb 09 '23 at 09:45
  • 1
    @Nick Glad it was of some help. Not sure why you think that though. Notice that `heapq.merge` does more than just merge heaps: It merges sorted arrays into one big sorted array. – Berthur Feb 09 '23 at 14:45
  • 1
    you are right, the output of `heapq.merge` is not a heap. Not sure why I thought it is. This clears it. – Nick Feb 10 '23 at 11:29
  • @Nick Yes, a bit unclear naming for that function I agree. Your intuition seems right though: Merging *k* heaps would feel faster than creating one from scratch. And definitely faster than *O(n log k)*. It gets a bit complicated though, see e.g. https://cs.stackexchange.com/questions/82793/merge-2-binary-heaps . – Berthur Feb 10 '23 at 11:39