Does optimizing an algorithm from O(2N) down to O(N) make it twice as fast?

Question

In Big-O Notation, O(N) and O(2N) describe the same complexity. That is to say, the growth rate of the time or space complexity for an algorithm at O(2N) is essentially equal to O(N). This can be seen especially when compared to an algorithm with a complexity like O(N^2) given an extremely large value for N. O(N) increases linearly while O(N^2) increases quadratically.

So I understand why O(N) and O(2N) are considered to be equal, but I'm still uncertain about treating these two as completely equal. In a program where your number of inputs N is 1 million or more, it seems to me like halving the time complexity would actually save quite a lot time because the program would have potentially millions less actions to execute.

I'm thinking of a program that contains two for-loops. Each for-loop iterates over the entire length of a very large array of N elements. This program would have a complexity of O(2N). O(2N) reduces to O(N), but I feel like an implementation that only requires one for-loop instead of two would make it a faster program (even if a single for-loop implementation sacrificed some functionality for the sake of speed, for example).

My question:

If you had an algorithm with time complexity O(2N), would optimizing it to have O(N) time complexity make it twice as fast?

To put it another way, is it ever significantly beneficial to optimize an O(2N) algorithm down to O(N)? I imagine there would be some increase in the speed of the program, or would the increase be so insignificant that it isn't worth the effort since O(2N) == O(N)?

BigO notation isn't a metric for speed, it's an indication of complexity and its growth rate. Your specific question has been asked, discussed, and answered, here: https://stackoverflow.com/questions/25777714/which-algorithm-is-faster-on-or-o2n — MatBailie, Apr 02 '21 at 02:58
Real-world modern computer performance for a specific problem size isn't a simple sum of costs anyway; cache locality and instruction-level parallelism (or lack thereof) can make a significant difference to how fast the *same* amount of work can run (not just the same complexity class). [What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?](https://stackoverflow.com/q/51607391) — Peter Cordes, Apr 02 '21 at 03:04
@MatBailie Hi, thank you for engaging with the question. It may have been a mistake to use the term "fast" in the title, but the body of my question shows that I know it is referring to complexity and growth rate. I've actually seen that stackoverflow link as I was writing this question, but didn't feel like it addressed my concern of being able to optimize a program if it were possible to determine a program was O(2N), then to change the implementation so that it was O(N), even if both are considered the same complexity. — RabbitOTM, Apr 02 '21 at 03:07
@RabbitOTM basically, the linked Q&A points out that your statement "changing `O(2N)` to `O(N)`" does not make sense, because `O(2N)` and `O(N)` are *the same thing*. Since your question and your thinking on this are based on a fallacy, the only valid answer is to point out the fallacy ... which is what the dup link does. Big O complexity and performance are *different*. The former is not even necessarily an indicator of the latter. — Stephen C, Apr 02 '21 at 04:39
An example of another similarly fallacious question; which is faster, a one litre engine, or a two litre engine? Engine size CAN be Related to speed, but engine size is NOT a Measure of speed. I doubt you'd want to pit a 2 litre saloon against a 1 litre racing motor bike. — MatBailie, Apr 02 '21 at 06:18

jurez · Accepted Answer · 2021-04-02T07:23:28.757

Time complexity is not the same as speed. For a given size of data, a program with O(N) might be slower, faster or the same speed as O(2N). Also, for a given size of data O(N) might be slower, faster or the same speed as O(N^2).

So if Big-O doesn't mean anything, why are we talking about it anyway?

Big-O notation describes the behaviour a program as the size of data increases. This behaviour is always relative. In other words, Big-O tells you the shape of asymptotic curve, but not its scale or dimension.

Let's say you have a program A that is O(N). This means that processing time will be linearly proportional to data size (ignoring real-world complications like cache sizes that might make the run-time more like piecewise-linear):

for 1000 rows it will take 3 seconds
for 2000 rows it will take 6 seconds
for 3000 rows it will take 9 seconds

And for another program B which is also O(N):

for 1000 rows it will take 1 second
for 2000 rows it will take 2 seconds
for 3000 rows it will take 3 seconds

Obviously, the second program is 3 times faster per row, even though they both have O(N). Intuitively, this tells you that both programs go through every row and spend some fixed time on processing it. The difference in time from 2000 to 1000 is the same as difference from 3000 to 2000 - this means that the grows linearly, in other words time needed for one record does not depend on number of all records. This is equivalent to program doing some kind of a for-loop, as for example when calculating a sum of numbers.

And, since the programs are different and do different things, it doesn't make any sense to compare 1 second of program A's time to 1 second of program B's time anyway. You would be comparing apples and oranges. That's why we don't care about the constant factor and we say that O(3n) is equivalent to O(n).

Now imagine a third program C, which is O(N^2).

for 1000 rows it will take 1 second
for 2000 rows it will take 4 seconds
for 3000 rows it will take 9 seconds

The difference in time here between 3000 and 2000 is bigger than difference between 2000 and 1000. The more the data, the bigger the increase. This is equivalent to a program doing a for loop inside a for loop - as, for example when searching for pairs in data.

When your data is small, you might not care about 1-2 seconds difference. If you compare programs A and C just from above timings and without understanding the underlying behaviour, you might be tempted to say that A is faster. But look what happens with more records:

for 10000 rows program A will take 30 seconds
for 10000 rows program C will take 1000 seconds
for 20000 rows program A will take 60 seconds
for 20000 rows program C will take 4000 seconds

Initially the same performance for the same data quickly becomes painfully obvious - by a factor of almost 100x. There is not a way in this worlds how running C on a faster CPU could ever keep up with A, and the bigger the data, the more this is true. The thing that makes all the difference is scalability. This means answering questions like how big of a machine are we going to need in 1 years' time when the database will grow to twice its size. With O(N), you are generally OK - you can buy more servers, more memory, use replication etc. With O(N^2) you are generally OK up to a certain size, at which point buying any number of new machines will not be enough to solve your problems any more and you will need to find a different approach in software, or run it on massively parallel hardware such as GPU clusters. With O(2^N) you are pretty much fucked unless you can somehow limit the maximum size of the data to something which is still useable.

Note that the above examples are theoretical and intentionally simplified; as @PeterCordes pointed out, the times on a real CPU might be different because of caching, branch misprediction, data alignment issues, vector operations and million other implementation-specific details. Please see his links in comments below.

An implementation being O(N) doesn't mean true performance for small finite sizes will truly scale linearly. Not on a modern computer with caches and TLBs whose size / coverage might be larger or smaller than the problem size. It might be reasonable to guess that performance will be close to piecewise-linear, with different slopes and jumps between domains as the problem gets bigger than L1d cache, bigger than L2 cache, big enough to get TLB misses, etc. For L1d-sized problems you might be limited by the CPU core, especially if you (or your compiler) didn't unroll to e.g. hide FP latency. — Peter Cordes, Apr 02 '21 at 03:38
TL:DR your examples of concrete sizes / times are good, but could use more weasel words to explain that it's not always that simple. — Peter Cordes, Apr 02 '21 at 03:39
Your edit doesn't seem to address the concern I raised. You're still missing any qualifiers or notes that *This means that processing time will be linearly proportional to data size:* is an over-simplification. Usually big-O analysis counts operations, or something other than clock cycles, and it's *not* necessarily true that increasing a loop bound by 1 increases the cost by 1. Looping over data gets a lot slower at the boundary where it doesn't fit in L2 cache. e.g. see figure 3.10 in https://www.akkadia.org/drepper/cpumemory.pdf - cycles / list-element vs. problem size, sequential read — Peter Cordes, Apr 02 '21 at 03:56
(That's the hopefully-well-known [What Every Programmer Should Know About Memory?](https://stackoverflow.com/q/8126311) article.) — Peter Cordes, Apr 02 '21 at 03:56
@PeterCordes Absolutely, and thanks for pointing it out. It is impossible to put everything in a single answer and I intentionally try to keep things simple to make the point understandable. I encourage you to write your own answer because it would make important contribution. — jurez, Apr 02 '21 at 03:59
I like this answer, I just think it's important to mention when something is really true all the way down to the nitty gritty details, or when it's a simplification. Especially when the question is about an intentional "simplification" like big-O complexity classes. I've already written an answer on [What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?](https://stackoverflow.com/q/51607391) which I linked in a comment on the question, and on whether "what Every programmer... Memory" is still relevant (see above). — Peter Cordes, Apr 02 '21 at 04:01

Does optimizing an algorithm from O(2N) down to O(N) make it twice as fast?

1 Answers1