What is the Time Complexity of finding the max k integers?

Question

def max_k_sort(k, nums):
    # sort nums first using timsort
    # add O(n*log(n)) time complexity
    sorted_nums = sorted(nums)

    return sorted_nums[-1*k:len(nums)]

def max_k(k, nums):
    # build initial max number list
    max_nums = {}

    # add O(k) time complexity?
    i = 0
    while i < k:
        max_nums[i] = 0
        i += 1

    # add O(n) time complexity?
    least_max_key = min(max_nums, key=max_nums.get)
    least_max = max_nums[least_max_key]

    # add O(n) time complexity?
    for n in nums:
        if n > least_max:
            max_nums[least_max_key] = n
            least_max_key = min(max_nums, key=max_nums.get)
            least_max = max_nums[least_max_key]

    return max_nums.values()

print(max_k(5, [2, 8, 4, 9, 0, 12, 12, 6, 5]))

I'm quite unsure of the time complexity of this code. The task is to return the max k numbers from an unsorted integer array. Each number in the array is in the range [0, 10000). My goal is to have one obvious solution max_k_sort(k, nums) that does the task in O(n*log(n)) time complexity and another method max_k(k, nums) that does the task in O(n) time complexity where n is the number of integers passed and k is the number of max values to find. I can't help but wonder if there's a way to return the max values sorted in O(n) time complexity.

"I can't help but wonder if there's a way to return the max values sorted in O(n) time complexity." - as long as you're working with comparisons, no dice. If there was, you could just pass `k=n` to sort the array in O(n) time, and you can't distinguish n! possible input orders with O(n) comparisons. — user2357112, May 23 '15 at 22:16
`max_k` is `O(n)`? I am not very sure. What about the `min` within the `for`? — thefourtheye, May 23 '15 at 22:21
Also, do you actually need the max k values in sorted order, or do you just need the max k values? If you don't care what order they're in, you can use an O(n) [selection algorithm](http://en.wikipedia.org/wiki/Selection_algorithm) to pick the kth-highest value and then build an array of every element >= that value. — user2357112, May 23 '15 at 22:36
@user2357112, using selection algorithm seems perfect answer. But if we solve it by selection algorithm, we didn't utilized the information that problem statement give us which is range of the numbers. "Each number in the array is in the range [0, 10000)." — Hengameh, Aug 19 '15 at 12:13

score 9 · Accepted Answer · edited May 23 '17 at 12:07

for n in nums:
        if n > least_max:
            max_nums[least_max_key] = n
            least_max_key = min(max_nums, key=max_nums.get) # this is O(k)
            least_max = max_nums[least_max_key]

You're doing an O(k) operation n times, so the complexity of your second function is O(n*k).

Assuming you want the output in sorted order, this can be done most easily in O(n*log(k)) by creating a k-sized heap and pushing everything onto it. This is implemented for you in heapq.nlargest.

import heapq

heapq.nlargest(5, [2, 8, 4, 9, 0, 12, 12, 6, 5])
Out[4]: [12, 12, 9, 8, 6]

If you don't want the output in sorted order, this can technically be done in O(n). There exist algorithms (and python implementations) to find the kth largest element in an array in linear time; it's easy to see that one more pass through the array would allow you to build an array of all numbers k and larger, giving overall O(n).

"The mathematical lower-bound for this operation is `O(n*log(k))`" - not quite. It can be done in `O(n+k*log(k))` by using a selection algorithm like [introselect](http://en.wikipedia.org/wiki/Introselect) to pick the kth-highest element, then building an array of the top k elements and sorting it. If the order of the array is unimportant, we can bring it down to `O(n)` time. — user2357112, May 23 '15 at 22:35

dawg · Answer 2 · 2015-05-24T04:51:00.643

The time complexity of list operations in Python states list sorts are O(N log N).

Slices are O(k)

So:

def max_k(k, nums):
    nums.sort(reverse=True)
    return nums[0:k]

O(k) + O(n log n) is O(n log n) where O(k) is less than O(n log n)

>>> max_k(5, [2, 8, 4, 9, 0, 12, 12, 6, 5])
[12, 12, 9, 8, 6]

As a practical matter, try timing them:

import heapq
def max_k1(k, nums):
    nums.sort(reverse=True)
    return nums[0:k]

def max_k2(k, nums):
    return heapq.nlargest(k, nums)    

if __name__ == '__main__':
    import timeit
    for f in (max_k1, max_k2):
        li=[2, 8, 4, 9, 0, 12, 12, 6, 5]
        print f.__name__, timeit.timeit('f(5, li)', setup='from __main__ import f, li')

Prints:

max_k1 0.240165948868
max_k2 4.96488595009

So sort and slice is 20x faster than heapq.

Based on the comment:

import heapq
def max_k1(k, nums):
    nums.sort(reverse=True)
    return nums[0:k]

def max_k2(k, nums):
    return heapq.nlargest(k, nums)   

def max_k3(k, nums):
    return sorted(nums, reverse=True)[0:k]    

if __name__ == '__main__':
    import timeit
    for f in (max_k1, max_k2, max_k3):
        li=[2, 8, 4, 9, 0, 12, 12, 6, 5]
        print f.__name__, timeit.timeit('f(5, li)', setup='from __main__ import f, li')    

max_k1 0.242296934128
max_k2 4.52635192871
max_k3 0.332237005234

I believe the test is flawed. `nums.sort` sorts the `nums` in-place. So, from the next `max_k`, `nums` will be already sorted. Try using `sorted(nums, reverse=True)[:k]`. Apart from that, quoting the docs, `The latter two functions(nlargest, nsmallest) perform best for smaller values of n. For larger values, it is more efficient to use the sorted() function. Also, when n==1, it is more efficient to use the built-in min() and max() functions. If repeated usage of these functions is required, consider turning the iterable into an actual heap.` — thefourtheye, May 24 '15 at 02:53
Uh, constant factors are going to dominate your benchmarks when n is so tiny. Of course natively-implemented functions are going to win when n=10. Try n=100000, k=5. — roippi, May 24 '15 at 11:49

What is the Time Complexity of finding the max k integers?

2 Answers2