Role of lower order terms in big O notation

Question

In big O notation, we always say that we should ignore constant factors for most cases. That is, rather than writing,

3n^2-100n+6

we are almost always satisfied with

n^2

since that term is the fastest growing term in the equation.

But I found many algorithm courses starts comparing functions with many terms

2n^2+120n+5 = big O of n^2

then finding c and n0 for those long functions, before recommending to ignore low order terms in the end.

My question is what would I get from trying to understand and annalising these kinds of functions with many terms? Before this month I am comfortable with understanding what O(1), O(n), O(LOG(n)), O(N^3) mean. But am I missing some important concepts if I just rely on this typically used functions? What will I miss if I skipped analysing those long functions?

In general, or educationally? In your toy example, 2n^2 only dominates after n=7 or so... so the lower terms matter for n<7... A real life example is matrix multiplication; the fastest theoretical algorithms are not, by far, the fastest practical algorithms — en_Knight, Dec 22 '15 at 17:50
@en_Knight in general. I think if I can get through with just understanding what functions like O(n) means, I'm wondering if there are other benifits with learning those long functions aside from exercises in algebra — lightning_missile, Dec 22 '15 at 18:05
@morbidCode if you want to derive complexities, you'll need to know how (see the answers); even if you just want to memorize the results, you still need to understand about the long functions (sometimes): for example https://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_algorithm , practically speaking — en_Knight, Dec 22 '15 at 18:34
It might be good if you looked at [my answer to a similar question](https://stackoverflow.com/questions/32312987/how-can-an-algorithm-is-of-on-also-be-on2-on1000000-o2n/32313092#32313092), with the understanding that a larger `c` will make the slopes steeper. — erip, Dec 22 '15 at 18:44

score 3 · Answer 1 · edited May 23 '17 at 10:28

Let's first of all describe what we mean when we say that f(n) is in O(g(n)):

... we can say that f(n) is O(g(n)) if we can find a constant c such that f(n) is less than c·g(n) or all n larger than n0, i.e., for all n>n0.

In equation for: we need to find one set of constants (c, n0) that fulfils

f(n) < c · g(n), for all n > n0,                        (+)

Now, the result that f(n) is in O(g(n)) is sometimes presented in difference forms, e.g. as f(n) = O(g(n)) or f(n) ∈ O(g(n)), but the statement is the same. Hence, from your question, the statement 2n^2+120n+5 = big O of n^2 is just:

f(n) = 2n^2 + 120n + 5

a result after some analysis: f(n) is in O(g(n)), where

    g(n) = n^2

Ok, with this out of the way, we look at the constant term in the functions we want to analyse asymptotically, and let's look at it educationally, using however, your example.

As the result of any big-O analysis is the asymptotic behaviour of a function, in all but some very unusual cases, the constant term has no effect whatsoever on this behaviour. The constant factor can, however, affect how to choose the constant pair (c, n0) used to show that f(n) is in O(g(n)) for some functions f(n) and g(n), i.e., the none-unique constant pair (c, n0) used to show that (+) holds. We can say that the constant term will have no effect of our result of the analysis, but it can affect our derivation of this result.

Lets look at your function as well as another related function

f(n) = 2n^2 + 120n + 5                                        (x)
h(n) = 2n^2 + 120n + 22500                                    (xx)

Using a similar approach as in this thread, for f(n), we can show:

linear term: 

    120n < n^2 for n > 120 (verify: 120n = n^2 at n = 120)    (i)

constant term:

    5 < n^2 for e.g. n > 3 (verify: 3^2 = 9 > 5)              (ii)

This means that if we replace both 120n as well as 5 in (x) by n^2 we can state the following inequality result:

Given that n > 120, we have:
2n^2 + n^2 + n^2 = 4n^2 > {by (ii)} > 2n^2 + 120n + 5 = f(n)  (iii)

From (iii), we can choose (c, n0) = (4, 120), and (iii) then shows that these constants fulfil (+) for f(n) with g(n) = n^2, and hence

result: f(n) is in O(n^2)

Now, for for h(n), we analogously have:

linear term (same as for f(n)) 

    120n < n^2 for n > 120 (verify: 120n = n^2 at n = 120)    (I)

constant term:

    22500 < n^2 for e.g. n > 150 (verify: 150^2 = 22500)      (II)

In this case, we replace 120n as well as 22500 in (xx) by n^2, but we need a larger less than constraint on n for these to hold, namely n > 150. Hence, we the following holds:

Given that n > 150, we have:
2n^2 + n^2 + n^2 = 4n^2 > {by (ii)} > 2n^2 + 120n + 5 = h(n)  (III)

In same way as for f(n), we can, here, choose (c, n0) = (4, 150), and (III) then shows that these constants fulfil (+) for h(n), with g(n) = n^2, and hence

result: h(n) is in O(n^2)

Hence, we have the same result for both functions f(n) and h(n), but we had to use different constants (c,n0) to show these (i.e., somewhat different derivation). Note finally that:

Naturally the constants (c,n0) = (4,150) (used for h(n) analysis) are also valid to show that f(n) is in O(n^2), i.e., that (+) holds for f(n) with g(n)=n^2.
However, not the reverse: (c,n0) = (4,120) cannot be used to show that (+) holds for h(n) (with g(n)=n^2).

The core of this discussion is that:

As long as you look at sufficiently large values of n, you will be able to describe the constant terms in relations as constant < dominantTerm(n), where, in our example, we look at the relation with regard to dominant term n^2.
The asymptotic behaviour of a function will not (in all but some very unusual cases) depend on the constant terms, so we might as well skip looking at them at all. However, for a rigorous proof of the asymptotic behaviour of some function, we need to take into account also the constant terms.

"The asymptotic behaviour of a function will not (in all but some very unusual cases) depend on the constant terms, so we might as well skip looking at them at all." - so basically, professors write those long functions because of mathematical rigor and prufes? So if I'm not mathematically inclined I can just skipped the part where these long functions are discuss and just go on to understand the meaning of tipically used functions like O(1), O(n), etc? I'm asking because as you surely know by now, I absolutely suck at algebra :) — lightning_missile, Dec 22 '15 at 18:39
It would be an advice my own old professors would scold me for, but generally: yes, you can look at the highest order terms. **However**, the reason why our prof. try to teach us to rigorously show relations such as these is to give us a sound understanding of the method at hand. So take the statement above with a grain of salt: unless you really understand the underlying theory, simplifications *can* be dangerous. Also, for all our examples so far, we've only looked at polynomials, where the higher order term is quite easy for identify. For more complex functions, this is not always the case. — dfrib, Dec 22 '15 at 18:46
@morbidCode Also, take look at JB King:s answer below, I think it summarises well that simplifications are usually quite difficult to perform unless you are experienced with the problem of method at hand (specifically: what is "higher order terms" is not always apparent) — dfrib, Dec 22 '15 at 18:48
it seems no escaping it, I'd have to get back to math. Thanks. — lightning_missile, Dec 22 '15 at 18:53

score 2 · Accepted Answer · edited Dec 22 '15 at 18:51

Ever have intermediate steps in your work? That is what this likely is as when you are computing a big O, chances are you don't already know for sure what the highest order term is and thus you keep track of them all and then determine which complexity class makes sense in the end. There is also something to be said for understanding why the lower order terms can be ignored.

Take some graph algorithms like a minimum spanning tree or shortest path. Now, can just looking at an algorithm you know what the highest term will be? I know I wouldn't and so I'd trace through the algorithm and collect a bunch of terms.

If you want another example, consider Sorting Algorithms and whether you want to memorize all the complexities or not. Bubble Sort, Shell Sort, Merge Sort, Quick Sort, Radix Sort and Heap Sort are a few of the more common algorithms out there. You could either memorize both the algorithm and complexity or just the algorithm and derive the complexity from the pseudo code if you know how to trace them.

it seems no escaping it, I'd have to get back to math. Thanks. — lightning_missile, Dec 22 '15 at 18:53
Memorizing seems most efficient. Brains are like maps, so it's O(1) lookup... ...right? — erip, Dec 22 '15 at 18:54

Role of lower order terms in big O notation

2 Answers2

Linked