8

I'm revising the formal definitions of Big O and the other associated bounds and something is tripping me up. In the book I'm reading (Skiena) Big O is defined as:

f(n) = O(g(n)) when there exists a constant c such that f(n) is always <= c*g(n) for a some value of n > n0

This generally makes sense to me. We are only concerned with large enough values of n that the growth rates actually matter. But why multiply g(n) by c? It seem like I could choose a very large value for c, and make the whole thing arbitrary by blowing out the size of smaller g(n) values.

Secondary question: When choosing to classify an algorithm into a complexity class, is the general rule of thumb to just choose the lowest growth class that still holds according to the definition of Big O? According to the definition it seems valid to classify a constant time algorithm as O(n!), simply because f(n) will be <= c*g(n). Off course this offers no value.

Thanks!

Justin
  • 322
  • 3
  • 13
  • Big O notation just describes how memory/time changes with problem size. It does not tell you actual time etc. You chose the largest. eg n^2 over n, etc. – Ed Heal Sep 04 '14 at 04:13
  • I'm not sure that answers my question. Yes it tell you how time changes with input size, however it is supposed to provide an upper bound. Therefore you need to choose the tightest upper bound you can right? Otherwise its not representative of the growth at all. I'm still not sure how the constant c plays into it. – Justin Sep 04 '14 at 04:17
  • No - It describes the growth - i.e. the shape of the graph. `c` is just the stretch value for one of the axis. Does not change the shape of the graph – Ed Heal Sep 04 '14 at 04:22
  • O(n) = O(n^2) = O(n!), according to the definition and with the strict left-to-right reading the peculiar "=" meaning. And yes, this doesn't offer any value. However, sometimes it's easier to determine that something is O(n^2) than to actually determine it is O(n^1.987), which allows being a bit lazy while still being accurate enough. – Ulrich Eckhardt Sep 04 '14 at 05:36
  • for future readers this may help: https://stackoverflow.com/questions/29954109/why-do-we-ignore-co-efficients-in-big-o-notation – csguy Jan 21 '20 at 02:18

2 Answers2

2

You can multiply g(n) by an arbitrary constant c is because you want functions that are only a constant c factor away from f(n). In simple terms you perform your analysis based on n, not constants so what you are concerned with is how those functions change depending on the input size only. For instance when you have n^3 and n there's no way you can choose a c where c*n >= n^3 unless c >= n^2 which isn't constant anymore so g(n) will be running away from f(n) with n.

As Ed mentioned this analysis won't give you an exact run time but a growth rate depending on input n. If g(n) and f(n) are always only (at most) a constant factor away from each other than the growth rate will be the same for both.

In this kind of time complexity analysis we don't really care about constant which in most cases is ok but in some cases you actually should take it into account. For instance if you are working on small sets an O(n^2) algorithm might actually be faster than O(nlogn) because of constants.

Second question: yes this is a common problem with BigO, you can use an arbitrary function that's why we usually are trying to find the "tightest" g(n) we can, otherwise there's no much point in finding it. That's also why *BigTheta is much more useful than BigO as it tells you a tight bound, instead of an upper bound.

Mateusz Dymczyk
  • 14,969
  • 10
  • 59
  • 94
  • Does Big Theta provide a tighter upper bound? Or does it just give an upper and lower bound therefore giving you a more representative range of what growth will look like? – Justin Sep 04 '14 at 04:19
  • it gives you both upper and lower bounds which means it's a tight bound (upper == lower). Big theta is there when BigO == BigOmega, no range involved. – Mateusz Dymczyk Sep 04 '14 at 04:20
  • Ok so let me try and get this straight with the help Ed heals comment. The value of c doesn't matter because it doesn't change the shape of the graph (when you graph g(n)), it basically just moves the line further up the y axis. So as the y value increases at the same rate, the growth rate is still the same. This makes sense to me, but why do I need c at all? Is it to compensate for the fact that we ignore constants in f(n)? Therefore f(n) will likely look larger on a graph if we don't introduce c? – Justin Sep 04 '14 at 04:29
  • It does not move it up the axis. It stretches the axis. the `c` is just to make the statement true. Need to read it mathematically – Ed Heal Sep 04 '14 at 04:31
  • And the point of that is because f(n) will likely contain constants, and therefore g(n) (being an abstract version of the growth rate) will not provide an upper bound on f(n). Is that correct? – Justin Sep 04 '14 at 04:42
  • @Justin you don't care about ```c``` because you are not checking if ```f(n)``` is really close to ```g(n)``` or not but whether the growth rates are same/similar for both and don't change with ```n```. Both ```2*n^2``` and ```n^2``` have the same growth rate, they grow at the same pace. The first one will have twice as big values but that does not matter for us. As I mentioned this can be sometimes misleading in practice. Yes as you mentioned ```f(n)``` can have constants, can be of type ```n^2+n+1``` etc. but in the end the growth rate will be the same. – Mateusz Dymczyk Sep 04 '14 at 04:42
  • Sure that makes sense to me. I guess what won't take hold in my mind is this. If all we really care about are the growth rates, why do we put this c value in at all? – Justin Sep 04 '14 at 04:46
  • @MateuszDymczyk - It is used to identify how algorithms behave. So your comment is correct in one sense. But usage in practice you can use it to know how to speed an algorithm up (more memory, more CPU) or for algorithm comparison – Ed Heal Sep 04 '14 at 04:47
  • @EdHeal yes of course you are right, I was just pointing out that sometimes that constant will make a difference (if I remember correctly that's why insertion sort is faster on small inputs than the "faster" quicksort?). – Mateusz Dymczyk Sep 04 '14 at 04:49
  • @Justin well without that constant in the definition for ```g(n)=n^2``` and ```f(n)=3*n^2``` won't hold and you would have to be more precise saying ```g(n)=3.1*n^2``` which isn't useful for us because as Ed and I mentioned we are not approximating the actual run time but only how the function grows. Which, again, as I said might be misleading as both ```f(n)=n^2``` and ```k(n)=10000*n^2``` have the same BigO complexity but for small inputs the runtime might be quite different but that's not what BigO is for. – Mateusz Dymczyk Sep 04 '14 at 04:52
  • Ok its making *more sense* to me now. I think I can grasp it well enough to continue in my study. Thanks to both of you! – Justin Sep 04 '14 at 04:54
  • @Justin no problem! Just as a finishing note, for me it really helps to remember that BigO describes a whole class of functions and not a single functions, hence the ```c``` which allows you to compare it with arbitrary functions (or well functions with arbitrary constants). – Mateusz Dymczyk Sep 04 '14 at 04:57
  • 1
    It finally just clicked into place for me when I re-read the definition of Big theta. plus some constant c1 and minus some constant c2, g(n) provides and upper and lower bound on f(n). Off-course that could only be the case if the growth rates were the same as n gets large! So Big Theta is basically saying that g(n) is not some arbitrary upper or lower bound, it IS the growth rate, hence why people say its a tight bound. I feel like I'm in the Matrix, all I can see is green scrolling code :P – Justin Sep 04 '14 at 05:15
  • yes that's why I said that big theta is there if both bigo and bigomega are the same :-) Glad you got it! – Mateusz Dymczyk Sep 04 '14 at 05:16
1

When choosing to classify an algorithm into a complexity class, is the general rule of thumb to just choose the lowest growth class that still holds according to the definition of Big O?

In terms of notations, just like we have big-O for upper bounds we have big-Omega for lower bounds and big-Theta for when you are able to show that the upper bound and the lower bounds match.

https://en.wikipedia.org/wiki/Big_O_notation#The_Knuth_definition

Assuming that Knuth quote is correct, then we can say that you are not alone in assuming that results involving tight asymptotic bounds are more useful :) Sometimes people say big-O when they actually meant to say big-Theta but some other times they just don't care or haven't managed to find the lower bound.

It seem like I could choose a very large value for c, and make the whole thing arbitrary by blowing out the size of smaller g(n) values.

For functions with different asymptotic growth rates, the c doesn't matter. No matter how big or how small you choose c to be, there will be an n when the faster growing function catches up. The constant factor is there to allow you to ignore constant multipliers when things have the same growth rate. For example, when it comes to big-O, f(x) = 2x and g(x) = 3x both have the same growth rate.

hugomg
  • 68,213
  • 24
  • 160
  • 246
  • 1
    Your comment "No matter how big or how small you choose c to be, there will be an n when the faster growing function catches up", is very useful. Thanks! – Justin Sep 04 '14 at 04:57