Why do we ignore co-efficients in Big O notation?

Question

While searching for answers relating to "Big O" notation, I have seen many SO answers such as this, this, or this, but still I have not clearly understood some points.

Why do we ignore the co-efficients?

For example this answer says that the final complexity of 2N + 2 is O(N); we remove the leading co-efficient 2 and the final constant 2 as well.

Removing the final constant of 2 perhaps understandable. After all, N may be very large and so "forgetting" the final 2 may only change the grand total by a small percentage.

However I cannot clearly understand how removing the leading co-efficient does not make difference. If the leading 2 above became a 1 or a 3, the percentage change to the grand total would be large.

Similarly, apparently 2N^3 + 99N^2 + 500 is O(N^3). How do we ignore the 99N^2 along with the 500?

Simple answer - because that's how big O is defined. You probably want to ask - why do we care for big O notation, that ignores co-efficients, rather than using something that is not, like the [tilde notation](http://math.stackexchange.com/q/316964) — amit, Apr 29 '15 at 21:47

score 18 · Accepted Answer · edited Jun 20 '20 at 09:12

The purpose of the Big-O notation is to find what is the dominant factor in the asymptotic behavior of a function as the value tends towards the infinity.

As we walk through the function domain, some factors become more important than others.

Imagine f(n) = n^3+n^2. As n goes to infinity, n^2 becomes less and less relevant when compared with n^3.

But that's just the intuition behind the definition. In practice we ignore some portions of the function because of the formal definition:

f(x) = O(g(x)) as x->infinity

if and only if there is a positive real M and a real x_0 such as

|f(x)| <= M|g(x)| for all x > x_0.

That's in wikipedia. What that actually means is that there is a point (after x_0) after which some multiple of g(x) dominates f(x). That definition acts like a loose upper bound on the value of f(x).

From that we can derive many other properties, like f(x)+K = O(f(x)), f(x^n+x^n-1)=O(x^n), etc. It's just a matter of using the definition to prove those.

In special, the intuition behind removing the coefficient (K*f(x) = O(f(x))) lies in what we try to measure with computational complexity. Ultimately it's all about time (or any resource, actually). But it's hard to know how much time each operation take. One algorithm may perform 2n operations and the other n, but the latter may have a large constant time associated with it. So, for this purpose, isn't easy to reason about the difference between n and 2n.

this is the best supplemental definition for Big O I've seen — csguy, Jan 21 '20 at 02:18
So the Big O won't be affected if the programmer wants to create a loop like this? [HYPOTHETICALLY] `for(int i ; i < 1000000*n ; i++)`. This is still O(n) even though O(n) does better than O(1000000*n). When n values are closer to zero, the difference is huge. — aryankarim, Jun 10 '22 at 10:04
@aryankarim Exactly. Big O has limited use in comparing algorithms of same complexity. But imagine writing `for(int i ; i < n*n ; i++)` instead. That O(n^2) algorithm will be perform more loops than the first for n>1000000, even though this one does not have an absurd constant factor. Every algorithm with complexity higher than O(n) will have some threshold after which it performs worse than this example you gave. — Juan Lopes, Jun 11 '22 at 00:55

phs · Answer 2 · 2015-04-29T21:14:06.620

From a (complexity) theory point of view, the coefficients represent hardware details that we can ignore. Specifically, the Linear Speedup Theorem dictates that for any problem we can always throw an exponentially increasing amount of hardware (money) at a computer to get a linear boost in speed.

Therefore, modulo expensive hardware purchases two algorithms that solve the same problem, one at twice the speed of the other for all input sizes, are considered essentially the same.

Big-O (Landau) notation has its origins independently in number theory, where one of its uses is to create a kind of equivalence between functions: if a given function is bounded above by another and simultaneously is bounded below by a scaled version of that same other function, then the two functions are essentially the same from an asymptotic point of view. The definition of Big-O (actually, "Big-Theta") captures this situation: the "Big-O" (Theta) of the two functions are exactly equal.

The fact that Big-O notation allows us to disregard the leading constant when comparing the growth of functions makes Big-O an ideal vehicle to measure various qualities of algorithms while respecting (ignoring) the "freebie" optimizations offered by the Linear Speedup Theorem.

I like the interpretation of the linear speedup thing as trading time for machine size. — G. Bach, Apr 29 '15 at 22:41

Rag · Answer 3 · 2015-04-29T21:05:58.540

An other thing is that, what I have understood, the complexity of 2N^3 + 99N^2 + 500 will be O(N^3). So how do we ignore/remove 99N^2 portion even? Will it not make difference when let's say N is one miilion?

That's right, in that case the 99N^2 term is far overshadowed by the 2N^3 term. The point where they cross is at N=49.5, much less than one million.

But you bring up a good point. Asymptotic computational complexity analysis is in fact often criticized for ignoring constant factors that can make a huge difference in real-world applications. However, big-O is still a useful tool for capturing the efficiency of an algorithm in a few syllables. It's often the case that an n^2 algorithm will be faster in real life than an n^3 algorithm for nontrivial n, and it's almost always the case that a log(n) algorithm will be much faster than an n^2 algorithm.

In addition to being a handy yardstick for approximating practical efficiency, it's also an important tool for the theoretical analysis of algorithm complexity. Many useful properties arise from the composability of polynomials - this makes sense because nested looping is fundamental to computation, and those correspond to polynomial numbers of steps. Using asymptotic complexity analysis, you can prove a rich set of relationships between different categories of algorithms, and that teaches us things about exactly how efficiently certain problems can be solved.

score 0 · Answer 4 · answered Apr 29 '15 at 21:02

Big O provides a good estimate of what algorithms are more efficient for larger inputs, all things being equal; this is why for an algorithm with an n^3 and an n^2 factor we ignore the n^2 factor, because even if the n^2 factor has a large constant it will eventually be dominated by the n^3 factor.

However, real algorithms incorporate more than simple Big O analysis, for example a sorting algorithm will often start with a O(n * log(n)) partitioning algorithm like quicksort or mergesort, and when the partitions become small enough the algorithm will switch to a simpler O(n^2) algorithm like insertionsort - for small inputs insertionsort is generally faster, although a basic Big O analysis doesn't reveal this.

The constant factors often aren't very interesting, and so they're omitted - certainly a difference in factors on the order of 1000 is interesting, but usually the difference in factors are smaller, and then there are many more constant factors to consider that may dominate the algorithms' constants. Let's say I've got two algorithms, the first with running time 3*n and the second with running time 2*n, each with comparable space complexity. This analysis assumes uniform memory access; what if the first algorithm interacts better with the cache, and this more than makes up for the worse constant factor? What if more compiler optimizations can be applied to it, or it behaves better with the memory management subsystem, or requires less expensive IO (e.g. fewer disk seeks or fewer database joins or whatever) and so on? The constant factor for the algorithm is relevant, but there are many more constants that need to be considered. Often the easiest way to determine which algorithm is best is just to run them both on some sample inputs and time the results; over-relying on the algorithms' constant factors would hide this step.

Atomix · Answer 5 · 2015-04-29T21:20:51.853

The mathematical reason:

The real reason why we do this, is the way Big O-Notation is defined: A series (or lets use the word function) f(n) is in O(g(n)) when the series f(n)/g(n) is bounded. Example:

f(n)= 2*n^2
g(n)= n^2

f(n) is in O(g(n)) because (2*n^2)/(n^2) = 2 as n approaches Infinity. The term (2*n^2)/(n^2) doesn't become infinitely large (its always 2), so the quotient is bounded and thus 2*n^2 is in O(n^2).

Another one:

f(n) = n^2
g(n) = n

The term n^2/n (= n) becomes infinetely large, as n goes to infinity, so n^2 is not in O(n).

The same principle applies, when you have

f(n) = n^2 + 2*n + 20
g(n) = n^2

(n^2 + 2*n + 20)/(n^2) is also bounded, because it tends to 1, as n goes to infinity.

Big-O Notation basically describes, that your function f(n) is (from some value of n on to infinity) smaller than a function g(n), multiplied by a constant. With the previous example:

2*n^2 is in O(n^2), because we can find a value C, so that 2*n^2 is smaller than C*n^2. In this example we can pick C to be 5 or 10, for example, and the condition will be satisfied.

So what do you get out of this? If you know your algorithm has complexity O(10^n) and you input a list of 4 numbers, it may take only a short time. If you input 10 numbers, it will take a million times longer! If it's one million times longer or 5 million times longer doesn't really matter here. You can always use 5 more computers for it and have it run in the same amount of time, the real problem here is, that it scales incredibly bad with input size.

score 0 · Answer 6 · answered Apr 29 '15 at 21:17

Big O notation is not an absolute measure of complexity.

Rather it is a designation of how complexity will change as the variable changes. In other words as N increases the complexity will increase Big O(f(N)).

To explain why terms are not included we look at how fast the terms increase.

So, Big O(2n+2) has two terms 2n and 2. Looking at the rate of increase Big O(2) this term will never increase it does not contribute to the rate of increase at all so it goes away. Also since 2n increases faster than 2, the 2 turns into noise as n gets very large.

Similarly Big O(2n^3 + 99n^2) compares Big O(2n^3) and Big O(99n^2). For small values, say n < 50, the 99n^2 will contribute a larger nominal percentage than 2n^3. However if n gets very large, say 1000000, then 99n^2 although nominally large it is insignificant (close to 1 millionth) compared to the size of 2n^3.

As a consequence Big O(n^i) < Big O(n^(i+1)).

Coefficients are removed because of the mathematical definition of Big O.

To simplify the definition says Big O(f(n)) = Big O(f(cn)) for a constant c. This needs to be taken on faith because the reason for this is purely mathematical, and as such the proof would be too complex and dry to explain in simple terms.

score -1 · Answer 7 · answered Apr 29 '15 at 20:49

-1

For practical applications the constants does matter, so O(2 n^3) will be better than O(1000 n^2) for inputs with n smaller than 500.

There are two main ideas here: 1) If your algorithm should be great for any input, it should have a low time complexity, and 2) that n^3 grows so much faster than n^2, that perfering n^3 over n^2 almost never makes sense.

answered Apr 29 '15 at 20:49

RasmusWL

1,573
13
26

That isn't O-notation. The OP specifically asked about O. – PyRulez Apr 30 '15 at 01:13
@PyRulez what do you mean is not O-notation? – RasmusWL Apr 30 '15 at 09:50

Why do we ignore co-efficients in Big O notation?

7 Answers7

The mathematical reason:

Linked

Related