Why does my Assembly code run in linear time when it should run in O(N*sqrt(N)) time?

Question

So I was writing a prime number generator in assembly (to learn the language) and when I started benchmarking the program, I found that it was running in linear time rather than the expected O(N * sqrt(N)) time.

Why is this happening?

I went through and commented the code and put it up on github over here: https://github.com/kcolford/assembly_primes. It should be fairly readable, but if anyone wants me to comment it more thoroughly (like with explanations of each instruction) I'm happy to do that, I just didn't think that it would be necessary.

The assembler and linker that I'm using are the ones found in the GNU binutils and it is written for the x86_64 architecture running the linux kernel.

1. You've confirmed that it's giving the correct answers? 2. Your sample size is large enough that startup times etc. are small noise in the results? 3. You don't have a large linear coefficient that's overwhelming a small Nsqrt(N) coefficient? — Phil Perry, Dec 10 '13 at 23:48
The assembly code is bare and robust, yes all the answers are correct (it took forever for the python script to go through the whole file), it ends up processing 100,000,000 numbers. And the only overhead exists in the IO routine that prints the numbers out. — randomusername, Dec 10 '13 at 23:51
It's been forever since I've done any order calculations, but you've described it's worst case run-time if, for every number N it had to look at it had to look at all the primes up to sqrt(N). That's simply not the case, if my hand-waving math is good enough (and it probably isn't), it's more like log(N), or even log log(N) — Anya Shenanigans, Dec 11 '13 at 00:39
@Petesh really? Is that because it only has to do `sqrt(N)` calculations on a prime number and all other numbers are eliminated before then? Could you post an answer on how you arrived at `log(N)` because I don't know enough math to figure that out. — randomusername, Dec 11 '13 at 01:51
http://stackoverflow.com/questions/2582732/time-complexity-of-sieve-of-eratosthenes-algorithm — lurker, Dec 11 '13 at 02:48
You're constructing a table of prior primes, which is being used as a filter for any value N. You need to look at the growth rate of this table to determine the second order term. The growth rate of this table is determined by the [prime number theorum](http://en.wikipedia.org/wiki/Prime_number_theorem). It was 1am when I posted that, which is why I hand-waved the answer. — Anya Shenanigans, Dec 11 '13 at 08:56
I think Petesh and David Norris had some pretty good answers, so at the moment I have nothing new to contribute. — Phil Perry, Dec 11 '13 at 18:16

score 2 · Accepted Answer · answered Dec 11 '13 at 16:59

Big-O complexity is tricky to measure experimentally. Like Phil Perry suggested, there are often linear terms (or other terms) that overwhelm the primary O() term for "small" runs. These extra terms, hidden in big-O notation, always exist in real code and are necessary when analyzing the actual run time of something.

When I measure your sample code on my test machine, I get a clearly nonlinear relationship. I can't post a plot because of low reputation, but here's a table:

N (= n^2)     Time
  1000000    0.386
  4000000    1.846
  9000000    4.673
 16000000    9.275
 25000000   15.850
 36000000   24.690
 49000000   35.850
 64000000   49.887
 81000000   66.509
100000000   86.855

Also, I didn't thoroughly look through your exact implementation of the algorithm, but the Sieve of Eratosthenes is O(n log log n), not O(n sqrt(n)).

Also note that I/O is usually very expensive compared to number crunching. In your case, on my test machine, at n = 5000, the I/O accounts for almost 30% of the total run time. This would be a significant linear term in the run time analysis. You'd have to thoroughly analyze everything else the code is doing to figure out what else is affecting the measurements.

Why does my Assembly code run in linear time when it should run in O(N*sqrt(N)) time?

1 Answers1