0

I have a long list of unsorted integers, and I am iterating over the list, and for each iteration, I want to obtain the maximum- this would result in time complexity: n + (n-1) + (n-2) + ... = O(n^2)

The alternative is to sort the entire list, which I believe takes O(n log n) for time complexity. Then simply popping from the end- which would be O(n), so this won't affect the overall complexity.

So am I right in saying that sorting the list would be more efficient than taking the maximum each time?

wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • 2
    use `maxheap` for this purpose, check this https://docs.python.org/3/library/heapq.html#heapq.nlargest – Epsi95 Aug 28 '21 at 16:28
  • Yes, you're right. – Thierry Lathuille Aug 28 '21 at 16:30
  • 3
    Why are you uncertain about your reasoning? Seems utterly straightforward. Note that if you are removing the max at each iteration you also run into the problem that removing an arbitrary element from a python list is an `O(n)` operation in its own right. – John Coleman Aug 28 '21 at 16:30
  • Thanks guys, apologies if this is a basic question I just wanted to confirm and I couldn't find a similar question answered on SO. – Patrick_Chong Aug 28 '21 at 16:33
  • 2
    Note that "taking the max at every iteration" is basically [selection sort](https://en.wikipedia.org/wiki/Selection_sort). Selection sort is proprtional to n^2, as you correctly observed, and not as efficient as other sort algorithms (and certainly less efficient that the language's default sorting function). – Stef Aug 28 '21 at 16:45

2 Answers2

0

Time complexity is an asymptotic measure finding the max is O(n) and you have to do it n times. The theoretical lower bound for comparison based sorting is O(nlg(n)). By sorting once finding the max of the sorted array is O(1). Sorting is the way to go.

  • 1
    O(n log(n)) is an upper bound, not a lower bound. – Stef Aug 28 '21 at 16:47
  • @Stef To add to that, the lower bound is *Ω(n)* – wjandrea Aug 28 '21 at 16:50
  • @wjandrea Actually "the lower bound is n log(n)" and "the lower bound is n" and "the upper bound is n log(n)" all make sense, depending on precisely what we're talking about - because they are very vague statements and there is some subtlety in all the quantifiers implied by "best-case", "worst-case", "best algorithm", etc. My comment was just complaining that big-O cannot be used to express a lower bound, by definition. – Stef Aug 28 '21 at 20:27
  • In any case, I believe vague statements like "the upperbound is..." and "the lower bound is..." are not helpful because the only persons who understand their meaning are those who already know them. – Stef Aug 28 '21 at 20:27
  • @Stef How could "The theoretical lower bound for comparison based sorting" be greater than *Ω(n)*? Timsort for example is *Ω(n)* and *O(n log(n))*. – wjandrea Aug 28 '21 at 20:38
  • @Stef Why can't big-O be used for lower bounds, let alone by definition? Just like "the lower bound is n" means you can't do better than n, "the lower bound is O(n)" means you can't do better than O(n), no? – no comment Aug 28 '21 at 20:45
  • @don'ttalkjustcode because "the lower bound is O(n)" literally means "there exists a constant k such that the lower bound is less than k * n". So this sentence is placing an upper bound on the lower bound? That's just confusing. – Stef Aug 28 '21 at 20:52
  • @wjandrea "The lower bound is n log(n)" is true in the sense that there doesn't exist an algorithm that can sort every list of length n in less than n log(n). "The lower bound is n" is true in the sense that every sorting algorithm spends at least n-1 comparison on every list of length n. "The upper bound is n log(n)" is true in the sense that there exists an algorithm that can sort every list of length n in less than n log(n). All of these are things that you and I already knew, but saying "the lower bound is ..." without any more details is not helpful at all for those who didn't know them. – Stef Aug 28 '21 at 20:56
  • @Stef You seem to interpret "the lower bound is O(n)" as "(the lower bound) is an element of O(n)", whereas I mean it as "O(n) *is* the lower bound". Like, a number can be a lower bound for a set of numbers, a function can be a lower bound for a set of functions, a set can be a lower bound for a set of sets, and a complexity class can be a lower bound for a set of complexity classes. – no comment Aug 28 '21 at 22:09
  • @don'ttalkjustcode Fair enough. So O(n) is a set of functions which is a lower bound for a set of sets of functions. Which makes my point that "the lower bound is O(n)" is a confusing statement with more than one possible interpretation, and it should be avoided, especially if the people reading/listening are less experienced, or already confused about O() in general. Which a very large number of people appear to be. – Stef Aug 28 '21 at 22:29
-1

Your analysis is on-point: it's more efficient to sort once than to take the max each time.

The only correction is that popping from the end is O(1).

wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • 1
    But we would have to pop n times? Wouldn't that be O(n). P.S. it wasn't me who downvoted! – Patrick_Chong Aug 28 '21 at 16:57
  • @hfdd Iterating over a list is always *O(n)*, regardless of how you do it. If you were to pop from an arbitrary point, that'd become *O(n^2)*, but popping from the end is constant, so it stays at *O(n)*. P.S. no worries about the downvote in any case. I care more about learning than points :) – wjandrea Aug 28 '21 at 17:05
  • *"The only correction is that popping from the end is O(1)."* I believe the OP meant that popping **several elements** from the end is O(n) **total**, as evidenced by the end of the sentence: *"so this won't affect the overall complexity"* – Stef Aug 28 '21 at 20:01
  • 1
    @Stef You might be right. It'd be easier to tell with more context. I assumed OP wants to consume the whole list, but maybe they only want to consume *k* items, in which case actually [`heapq.nlargest(k, list)`](https://docs.python.org/3/library/heapq.html#heapq.nlargest) might be better, at something like *O(n log k)*. – wjandrea Aug 28 '21 at 20:09
  • @wjandrea Do you have a source that explains the complexity of `heapq.nlargest(k, list_of_length_n)`? Intuitively, I would guess it's O(n + k log(n)) since I guess it requires first building a heap of size n then extracting k elements. – Stef Aug 28 '21 at 20:13
  • @Stef No, that was more of a guess. *O(n + k log(n))* would make more sense. – wjandrea Aug 28 '21 at 20:14
  • 1
    @Stef [O(n log k)](https://github.com/python/cpython/blob/1046cd06b0e2f20b3be93de83d49b684956af98d/Lib/heapq.py#L398-L399). – no comment Aug 28 '21 at 20:58
  • @don'ttalkjustcode Although theoretically, that means that if `n` and `k` are both super large integers, but `n >> k`, then I shouldn't use `heapq.nlargest(k, list_of_length_n)`, and rather I should call `heapq.heapify(list_of_length_n)` then extract k elements myself? Because n+k log(n) < n log(k) in that particular situation. – Stef Aug 28 '21 at 21:04
  • 1
    @Stef Possibly. Also depends on your values. There are some arguments and a link at the end of that comment block. One argument for the current implementation is that in your case, with random/normal data, it's more like O(n) because most elements don't make it into the k-heap at all, rejected by one comparison. – no comment Aug 28 '21 at 21:26