1

I was solving this leetcode problem link, and found an amazing solution using heapq module, the the running time of this function is very less. This is below program:

from itertools import islice
import heapq

def nlargest(n, iterable):
    """Find the n largest elements in a dataset.

    Equivalent to:  sorted(iterable, reverse=True)[:n]
    """
    if n < 0:
        return []
    it = iter(iterable)
    result = list(islice(it, n))
    if not result:
        return result
    heapq.heapify(result)
    _heappushpop = heapq.heappushpop
    for elem in it:
        _heappushpop(result, elem)
    result.sort(reverse=True)
    return result

print nlargest(5, [10, 122, 2, 3, 3, 4, 5, 5, 10, 12, 23, 18, 17, 15, 100, 101])

This algorithm is really clever and you can also do the visualize here LINK

But I am having a hard time understanding the time complexity of the whole algorithm. Here is my analysis, and please correct me if I am wrong!

Time Complexity :

result = list(islice(it, n)) - > O(n)

heapq.heapify(result) -> O(len(result)) 

for elem in it:
        _heappushpop(result, elem)  -> I am confused at this part

result.sort(reverse=True) -> O(len(result)*log(len(result)))

Could anyone help me understand the overall time complexity of the algorithm.

python
  • 4,403
  • 13
  • 56
  • 103

1 Answers1

1

So you have two relevant paramaters here: n (the number of items to return), and, say, M (the number of items in the dataset).

islice(it, n) -- O(n)
heapify(result) -- O(n), because len(result)=n
for elem in it: _heappushpop(result, elem) -- performing M-N times an operation of O(logn), because len(result) remains n, i.e. (M-N)*logn
result.sort(reverse=True) -- O(n*logn)

Overall:

n + n + (M-n)*logn + n*logn

Resulting with O(M*logn). You can easily see the dominant part is the heappushpop loop (assuming M>>n, otherwise the problem is less interesting, because the solution is more or less reduced to sorting).


It is worth pointing out there are linear-time algorithms for solving this problem, so if your dataset is very big, it is worth checking them out.

Community
  • 1
  • 1
shx2
  • 61,779
  • 13
  • 130
  • 153
  • Why do you say pushpop will take log(n) time ? Can you elaborate this part. Push will take log(n) time and pop will take log(1) time ? – python Nov 11 '15 at 04:13
  • And what about rearranging after pop? – python Nov 11 '15 at 04:14
  • @python, I'm not sure what you're asking. Where is there a pop? there's only heappushpop – shx2 Nov 11 '15 at 04:17
  • Yes so here two operation will be performed right ? Heappush and Heappop? – python Nov 11 '15 at 04:18
  • 2
    `heappushpop` is a special implementation that is faster than a separate call to push and pop. [Here is the source code](https://fossies.org/dox/Python-3.5.0/heapq_8py_source.html), see the helper function "`_siftup`". – ely Nov 11 '15 at 04:19