First simple optimizations:
As mentioned lcm(i,j)
is the same as lcm(j,i)
so you compute almost all therms of the sum twice. Instead you can compute the repeating therms just once and then double the sum result.
remaining non repeating therms lcm(i,j)
where i==j
The lcm(i,i)=i
so they form a series 1+2+3+...+k
which is computable with equation: k*(k+1)/2
prime decomposition 2D LUT using SoE
Both previous optimizations combined are slightly more than 2 times faster (however the complexity is still quadratic so this is still not fast enough for bigger numbers !!!
For more advanced and faster approach you need enough memory so you could do SoE (Sieve of Eratosthenes) based prime decomposition to speed up the GCD and LCM operations considerably (better complexity on bignums and overally smaller constant time). The idea is to have k
lists holding the prime decomposition of each number up to k
.
Then to obtain GCD you just compare the 2 lists and multiply together all primes (with their powers) present in both lists.
The LCM is similar but you multiply almost all primes you just do use those which are present in both lists only once. This can be enhanced further by not using prime decomposition of one number in list and multiply by the number instead.
The search of both list can be done in O(m)
as the SoE will create the lists already sorted where m
is the number of primes that decompose i
or j
... which is much much less then their actual value.
In case you want to pre-allocate the individual i-th
list you can take number of primes up to sqrt(i)
as upper bound. Which can be computed with ~10% error by:
sqrt(i)/ln(sqrt(i));
So I used this just to be sure I have enough:
ceil(1.2*sqrt(i)/ln(sqrt(i)))+1;
On integers you could use log2 (or number of bits) instead + some big enough margin.
However 2D LUT like this will be big for big k
!!! Which will not fit into memory of standard computer so this is a no go for bignum k
.
This approach will give you much bigger boost (measurement show ~10x times faster runtime against the #1+#2 optimizations on small k
and gets better with increasing k
). This approach affects complexity (its better) on bigints but its dependent on the bigint implementation. My estimate its slightly below quadratic on mine bigints but had not tried k
nowhere near thresholds for multiplication as this is still too slow for bigints.
primes up to k
1D LUT using SoE
this will speed up the GCD and LCM operation but not as fast as the previous LUT. On the bright side this requires much less memory. However your current limit would still need ~15.8 TByte of storage which might be possible in future. However this will be still very slow...
using PSLQ
This might be the most promising way to achieve this task fast. Do a PSLQ analysis on the series formed by using
k={ 1,2,3,... }
A(k) = sum_of_lcm_up_to(k)
A = 1,7,28,72,177,303,604,948,1497,...
which might return you more direct equation.
IIRC BPP formula for computing Pi was found this way. Until then it was not known that we can compute individual digits of Pi without needing the previous ones nor that it can be computed without big numbers arithmetics...
However I never perform nor fully understood the PSLQ as its far from my field of expertise my naive understanding is that is something like PCA done on individual digits of series of numbers trying to find numeric relation between iteration of series
Here C++ code that combines #1,#2,#3 compilable with bigints and without too (if not using any 3th party libs) you just rewrite the bigint typedef with whatever you want to use. But beware on 32bit unsigned integers (without bigints) this will overflow with small numbers so the limit is only k<=390
!!!
//---------------------------------------------------------------------------
#include <math.h>
//---------------------------------------------------------------------------
typedef unsigned int bigint; // rewrite int to whatewer bigint datatype you got
//---------------------------------------------------------------------------
// prime decomposition LUT
int pdec_n=0,pdec_siz=0,*pdec_num=NULL,**pdec_LUT=NULL;;
void pdec_exit() // free memory
{
if (pdec_num) delete[] pdec_num;
if (pdec_LUT)
{
if (pdec_LUT[0]) delete[] pdec_LUT[0];
delete[] pdec_LUT;
}
pdec_n=0; pdec_siz=0; pdec_num=NULL; pdec_LUT=NULL;
}
void pdec_init(int n) // pdec_LUT[i][j] holds prime decomposition of i where j={ 0,1,2...pd_num[i]-1 }
{
double x;
int i,j,a;
pdec_exit();
pdec_n=n;
pdec_num=new int[n+1]; // primes in decomposition
pdec_LUT=new int*[n+1]; // decompositions
// compute decompositions size
pdec_num[0]=0;
pdec_num[1]=0;
for (j=0,i=2;i<=n;i++)
{
x=i;
x=sqrt(x); // primes up to sqrt(x)
x=x*log(2.7182818284590452353602874713527)/log(x);
x=ceil(1.2*x)+1; // +20% and round up +1 as a margin for error
a=x;
pdec_num[i]=a;
j+=a;
}
pdec_siz=j+10;
// compute starts of each decomposition
pdec_LUT[0]=new int[j]; // allocate buffer for all decompositions once
pdec_LUT[1]=pdec_LUT[0];
for (j=1,i=2;i<=n;j=i,i++)
{
pdec_LUT[i]=pdec_LUT[j]+pdec_num[j];
pdec_num[j]=0;
}
pdec_num[n]=0;
// SoE prime decomposition
for (i=2;i<=n;i++)
if (pdec_num[i]==0) // is prime?
for (j=i;j<=n;j+=i) // add it to all its multiples
for (a=j;a%i==0;a/=i){ pdec_LUT[j][pdec_num[j]]=i; pdec_num[j]++; } // add also powers
}
//---------------------------------------------------------------------------
bigint pdec_gcd(int a,int b) // fast GCD using pdec_LUT
{
bigint _gcd;
int i,j,ni,nj,*pi,*pj;
pi=pdec_LUT[a]; ni=pdec_num[a];
pj=pdec_LUT[b]; nj=pdec_num[b];
for (_gcd=1,i=0,j=0;(i<ni)&&(j<nj);)
{
a=pi[i];
b=pj[j];
if (a==b){ _gcd*=a; i++; j++; continue; }
if (a<b) i++; else j++;
}
return _gcd;
}
//---------------------------------------------------------------------------
bigint pdec_lcm(int a,int b) // fast LCM using pdec_LUT
{
bigint _lcm;
int i,j,ni,nj,*pi,*pj;
if (pdec_num[a]<pdec_num[b]){ i=a; a=b; b=i; } // b has less multiplications
pi=pdec_LUT[a]; ni=pdec_num[a];
pj=pdec_LUT[b]; nj=pdec_num[b];
for (_lcm=a,i=0,j=0;(i<ni)&&(j<nj);) // a is fully multiplied
{
a=pi[i];
b=pj[j];
if (a==b){ i++; j++; continue; }
if (a< b){ i++; }
if (a> b){ _lcm*=b; j++; } // use primes from b only if not present in a
}
for (;j<nj;j++) _lcm*=pj[j];
return _lcm;
}
//---------------------------------------------------------------------------
int gcd(int a,int b) // slow euclid GCD
{
if(!b) return a;
return gcd(b, a % b);
}
//---------------------------------------------------------------------------
bigint lcm(int a,int b) // slow LCM
{
bigint _lcm;
_lcm=a;
_lcm*=b;
_lcm/=gcd(a,b);
return _lcm;
}
//---------------------------------------------------------------------------
bigint sum_of_lcm0(int k)
{
int i,j;
bigint s;
// sum of lcm(i,j) up to k
for (s=0,i=1;i<=k;i++)
for (j=i+1;j<=k;j++) // do not compute the same terms twice
s+=lcm(i,j);
s+=s; // double the repeating therms
s+=k*(k+1)>>1; // add sum of series: 1+2+3+...+k
return s;
}
//---------------------------------------------------------------------------
bigint sum_of_lcm1(int k)
{
int i,j;
bigint s;
// sum of lcm(i,j) up to k
for (s=0,i=1;i<=k;i++)
for (j=i+1;j<=k;j++)
s+=pdec_lcm(i,j);
s+=s; // double the repeating therms
s+=k*(k+1)>>1; // add sum of series: 1+2+3+...+k
return s;
}
//---------------------------------------------------------------------------
Usage:
int k;
bigint s;
k=390; // cant get above this on 32bit uints
pdec_init(k);
s=sum_of_lcm0(k); // #1+#2 k limited by bigint
s=sum_of_lcm1(k); // #1+#2+#3 k limited by bigint and pdec_init
pdec_exit();
And here measurement for the faster approach using pdec_init(1000)
and 64bit bigints:
[ 0.507 ms] pdec init 38 KBytes
[ 1.222 ms] k= 100 sum= 18446224
[ 4.613 ms] k= 200 sum= 295280840
[ 10.734 ms] k= 300 sum= 1485491616
[ 21.983 ms] k= 400 sum= 4692090228
[ 38.277 ms] k= 500 sum= 11454503268
[ 52.168 ms] k= 600 sum= 23723529128
[ 67.608 ms] k= 700 sum= 43946379280
[ 83.128 ms] k= 800 sum= 74987181444
[ 106.942 ms] k= 900 sum=120019988208
[ 132.031 ms] k=1000 sum=183011304660
Here David Eisenstat's approach using the 2D prime decomposition LUT using unsigned math and ported into C++
bigint sum_of_lcm2(int k)
{
int g,*p,n,a,a0,i,sign;
bigint s,f;
// sum of lcm(i,j) up to k
for (s=0,g=1;g<=k;g++)
{
f=k/g;
f=((f+1)*f)>>1;
f*=f;
f*=g;
p=pdec_LUT[g];
n=pdec_num[g];
for (sign=0,a=0,i=0;i<n;i++)
{
a0=a; a=p[i];
if (a0==a) continue; // ignore powers
f*=(a-1);
sign=!sign;
}
if (sign) s-=f; else s+=f;
}
return s;
}
and measurements:
[ 62.085 ms] pdec init 18888 KBytes
[ 0.685 ms] k= 1000 sum= 183011304660
[ 1.373 ms] k= 2000 sum= 2926354006228
[ 2.072 ms] k= 3000 sum= 14805042797184
[ 2.781 ms] k= 4000 sum= 46781822900688
[ 3.539 ms] k= 5000 sum= 114218125116052
[ 4.756 ms] k= 6000 sum= 236808736848248
[ 5.223 ms] k= 7000 sum= 438721136165016
[ 6.806 ms] k= 8000 sum= 748454504589824
[ 7.645 ms] k= 9000 sum= 1198740116138576
[ 8.424 ms] k= 10000 sum= 1827127167830060
[ 9.303 ms] k= 11000 sum= 2675130127257140
[ 10.292 ms] k= 12000 sum= 3788610508835184
[ 10.220 ms] k= 13000 sum= 5218119174379260
[ 10.736 ms] k= 14000 sum= 7019284675919704
[ 10.782 ms] k= 15000 sum= 9249622447024312
[ 13.937 ms] k= 16000 sum= 11973761092807672
[ 14.610 ms] k= 17000 sum= 15260078673441888
[ 14.535 ms] k= 18000 sum= 19179420697925096
[ 15.512 ms] k= 19000 sum= 23809411309067844
[ 15.049 ms] k= 20000 sum= 29232792766430776
[ 15.454 ms] k= 21000 sum= 35530729593528460
[ 19.366 ms] k= 22000 sum= 42798909693505916
[ 19.164 ms] k= 23000 sum= 51128789507207824
[ 17.958 ms] k= 24000 sum= 60615343952225808
[ 18.378 ms] k= 25000 sum= 71367596207492300
[ 22.220 ms] k= 26000 sum= 83490183930172416
[ 22.565 ms] k= 27000 sum= 97093514169469316
[ 21.821 ms] k= 28000 sum= 112296054697946652
[ 24.109 ms] k= 29000 sum= 129222336308688284
[ 24.484 ms] k= 30000 sum= 147985909450776556
[ 23.518 ms] k= 31000 sum= 168726314083539724
[ 24.437 ms] k= 32000 sum= 191573248859171168
[ 25.810 ms] k= 33000 sum= 216665462512674040
[ 26.562 ms] k= 34000 sum= 244144733826449780
[ 25.961 ms] k= 35000 sum= 274165636445073084
[ 26.890 ms] k= 36000 sum= 306861019309280956
[ 27.347 ms] k= 37000 sum= 342403256303560068
[ 28.250 ms] k= 38000 sum= 380951178712974600
[ 28.792 ms] k= 39000 sum= 422654544640465536
[ 32.039 ms] k= 40000 sum= 467702202175553120
[ 31.689 ms] k= 41000 sum= 516261770711667188
[ 32.103 ms] k= 42000 sum= 568493102008765720
[ 33.605 ms] k= 43000 sum= 624601491354014924
[ 33.251 ms] k= 44000 sum= 684766511634347904
[ 38.647 ms] k= 45000 sum= 749168090843424836
[ 34.710 ms] k= 46000 sum= 818010181586572760
[ 36.381 ms] k= 47000 sum= 891505184775229460
[ 35.670 ms] k= 48000 sum= 969815208006099344
[ 36.745 ms] k= 49000 sum= 1053194632366715464
[ 39.265 ms] k= 50000 sum= 1141860626298697932
[ 41.430 ms] k= 51000 sum= 1235961937510193376
[ 39.108 ms] k= 52000 sum= 1335793856714706980
[ 39.395 ms] k= 53000 sum= 1441568696460268148
[ 41.918 ms] k= 54000 sum= 1553459252290960280
[ 40.703 ms] k= 55000 sum= 1671762162244218552
[ 41.705 ms] k= 56000 sum= 1796724219557144384
[ 43.305 ms] k= 57000 sum= 1928541735647080528
[ 49.717 ms] k= 58000 sum= 2067474716400847492
[ 43.599 ms] k= 59000 sum= 2213789303836196316
[ 48.271 ms] k= 60000 sum= 2367722439679998864
[ 45.474 ms] k= 61000 sum= 2529554127958058476
[ 47.711 ms] k= 62000 sum= 2699556160225194044
[ 50.947 ms] k= 63000 sum= 2877966575602455352
[ 53.505 ms] k= 64000 sum= 3065104550449828668
[ 54.559 ms] k= 65000 sum= 3261217251285430196
[ 53.066 ms] k= 66000 sum= 3466580213396622284
[ 51.004 ms] k= 67000 sum= 3681481683988726580
[ 52.176 ms] k= 68000 sum= 3906300655499965688
[ 52.597 ms] k= 69000 sum= 4141140842108345608
[ 54.037 ms] k= 70000 sum= 4386474829016834456
[ 55.015 ms] k= 71000 sum= 4642634007103668512
[ 56.036 ms] k= 72000 sum= 4909694650694395656
[ 56.753 ms] k= 73000 sum= 5188180625623422844
[ 56.944 ms] k= 74000 sum= 5478395966863778240
[ 58.519 ms] k= 75000 sum= 5780503390808063112
[ 58.851 ms] k= 76000 sum= 6095030347012208488
[ 59.481 ms] k= 77000 sum= 6422238822380824088
[ 59.821 ms] k= 78000 sum= 6762431127210826000
[ 60.474 ms] k= 79000 sum= 7115946997843513388
[ 62.483 ms] k= 80000 sum= 7483237599224918568
[ 62.773 ms] k= 81000 sum= 7864354728203024788
[ 63.597 ms] k= 82000 sum= 8259936085084243856
[ 63.737 ms] k= 83000 sum= 8670356746307209408
[ 65.268 ms] k= 84000 sum= 9095726603620478688
[ 66.408 ms] k= 85000 sum= 9536717580904610040
[ 66.726 ms] k= 86000 sum= 9993536125103095688
[ 67.575 ms] k= 87000 sum= 10466432130003173312
[ 68.633 ms] k= 88000 sum= 10956004885713459408
[ 68.506 ms] k= 89000 sum= 11462647299842391920
[ 69.760 ms] k= 90000 sum= 11986403043182154584
[ 70.185 ms] k= 91000 sum= 12528192303586301056
[ 71.192 ms] k= 92000 sum= 13088114114334229944
[ 72.118 ms] k= 93000 sum= 13666442859056921956
[ 72.514 ms] k= 94000 sum= 14263784093996812372
[ 73.450 ms] k= 95000 sum= 14880480246958258764
[ 75.154 ms] k= 96000 sum= 15516955556268238668
[ 75.907 ms] k= 97000 sum= 16173675941972720104
[ 76.416 ms] k= 98000 sum= 16851041998775635308
[ 76.931 ms] k= 99000 sum= 17549420431062508128
[ 77.511 ms] k=100000 sum= 18269345553999897648
It can be speeded up a bit more by removing the powers of primes from LUT. However for big k
this will not fit into memory so in such case you need 1D prime LUT and do the decomposition on the run which is much slower but does not need as much memory. IIRC there where some fast methods of prime decomposition when list of primes up to sqrt(k)
is known using FFT or NTT (unless I miss match it in memory with some different problem)