[Edit3] complete reedit
Your current approach is O(n^1.5)
not O(n^2)
Originally I suggested to see Why are my nested for loops taking so long to compute?
But as Oliver Charlesworth suggested to me to read About Vectors growth That should not be much of an issue here (also the measurements confirmed it).
So no need to preallocating of memroy for the list (it would just waste memory and due to CACHE barriers even lower the overall performance at least on mine setup).
So how to optimize?
- either lower the constant time so the runtime is better of your iteration (even with worse complexity)
- or lower the complexity so much that overhead is not bigger to actually have some speedup
I would start with SoF (Sieve of Eratosthenes)
But instead setting number as divisible I would add currently iterated sieve to the number divisor list. This should be O(n^2)
but with much lower overhead (no divisions and fully parallelisable) if coded right.
start computing SoF for all numbers i=2,3,4,5,...,n-1
for each number x
you hit do not update SoF table (you do not need it). Instead add the iterated sieve i
to the divisor list of x
. Something like:
C++ source:
const int n=1000000;
List<int> divs[n];
void divisors()
{
int i,x;
for (i=1;i<n;i++)
for (x=i;x<n;x+=i)
divs[x].add(i);
}
This took 1.739
s and found 13969984
divisors total, max 240
divisors per number (including 1
and x
). As you can see it does not use any divisions. and the divisors are sorted ascending.
List<int>
is dynamic list of integers template (something like your vector<>
)
You can adapt this to your kind of iteration so you can check up to nn=sqrt(n)
and add 2 divisors per iteration that is O(n^1.5*log(n))
with different constant time (overhead) a bit slower due to single division need and duplicity check (log(n)
with high base) so you need to measure if it speeds things up or not on my setup is this way slower (~2.383
s even if it is with better complexity).
const int n=1000000;
List<int> divs[n];
int i,j,x,y,nn=sqrt(n);
for (i=1;i<=nn;i++)
for (x=i;x<n;x+=i)
{
for (y=divs[x].num-1;y>=0;y--)
if (i==divs[x][y]) break;
if (y<0) divs[x].add(i);
j=x/i;
for (y=divs[x].num-1;y>=0;y--)
if (j==divs[x][y]) break;
if (y<0) divs[x].add(j);
}
Next thing is to use direct memory access (not sure you can do that with vector<>
) my list is capable of such thing do not confuse it with hardware DMA this is just avoidance of array range checking. This speeds up the constant overhead of the duplicity check and the result time is [1.793s]
which is a little bit slower then the raw SoF O(n^2)
version. So if you got bigger n
this would be the way.
[Notes]
If you want to do prime decomposition then iterate i
only through primes (in that case you need the SoF table) ...
If you got problems with the SoF or primes look at Prime numbers by Eratosthenes quicker sequential than concurrently? for some additional ideas on this