I had occasion to study the computation of GCD sums because the problem cropped up in a HackerEarth tutorial named GCD Sum. Googling turned up some academic papers with useful formulas, which I'm reporting here since they aren't mentioned in the MathOverflow article linked by deviantfan.
For coprime m and n (i.e. gcd(m, n) == 1) the function is multiplicative:
gcd_sum[m * n] = gcd_sum[m] * gcd_sum[n]
Powers e of primes p:
gcd_sum[p^e] = (e + 1) * p^e - e * p^(e - 1)
If only a single sum is to be computed then these formulas could be applied to the result of factoring the number in question, which would still be way faster than repeated gcd()
calls or going through the rigmarole proposed by Толя.
However, the formulas could just as easily be used to compute whole tables of the function efficiently. Basically, all you have to do is plug them into the algorithm for linear time Euler totient calculation and you're done - this computes all GCD sums up to a million much faster than you can compute the single GCD sum for the number 10^6 by way of calls to a gcd()
function. Basically, the algorithm efficiently enumerates the least factor decompositions of the numbers up to n in a way that makes it easy to compute any multiplicative function - Euler totient (a.k.a. phi), the sigmas or, in fact, GCD sums.
Here's a bit of hashish code that computes a table of GCD sums for smallish limits - ‘small’ in the sense that sqrt(N) * N does not overflow a 32-bit signed integer. IOW, it works for a limit of 10^6 (plenty enough for the HackerEarth task with its limit of 5 * 10^5) but a limit of 10^7 would require sticking (long)
casts in a couple of strategic places. However, such hardening of the function for operation at higher ranges is left as the proverbial exercise for the reader... ;-)
static int[] precompute_Pillai (int limit)
{
var small_primes = new List<ushort>();
var result = new int[1 + limit];
result[1] = 1;
int n = 2, small_prime_limit = (int)Math.Sqrt(limit);
for (int half = limit / 2; n <= half; ++n)
{
int f_n = result[n];
if (f_n == 0)
{
f_n = result[n] = 2 * n - 1;
if (n <= small_prime_limit)
{
small_primes.Add((ushort)n);
}
}
foreach (int prime in small_primes)
{
int nth_multiple = n * prime, e = 1, p = 1; // 1e6 * 1e3 < INT_MAX
if (nth_multiple > limit)
break;
if (n % prime == 0)
{
if (n == prime)
{
f_n = 1;
e = 2;
p = prime;
}
else break;
}
for (int q; ; ++e, p = q)
{
result[nth_multiple] = f_n * ((e + 1) * (q = p * prime) - e * p);
if ((nth_multiple *= prime) > limit)
break;
}
}
}
for ( ; n <= limit; ++n)
if (result[n] == 0)
result[n] = 2 * n - 1;
return result;
}
As promised, this computes all GCD sums up to 500,000 in 12.4 ms, whereas computing the single sum for 500,000 via gcd()
calls takes 48.1 ms on the same machine. The code has been verified against an OEIS list of the Pillai function (A018804) up to 2000, and up to 500,000 against a gcd-based function - an undertaking that took a full 4 hours.
There's a whole range of optimisations that could be applied to make the code significantly faster, like replacing the modulo division with a multiplication (with the inverse) and a comparison, or to shave some more milliseconds by way of stepping the ‘prime cleaner-upper’ loop modulo 6. However, I wanted to show the algorithm in its basic, unoptimised form because (a) it is plenty fast as it is, and (b) it could be useful for other multiplicative functions, not just GCD sums.
P.S.: modulo testing via multiplication with the inverse is described in section 9 of the Granlund/Montgomery paper Division by Invariant Integers using Multiplication but it is hard to find info on efficient computation of inverses modulo powers of 2. Most sources use the Extended Euclid's algorithm or similar overkill. So here comes a function that computes multiplicative inverses modulo 2^32:
static uint ModularInverse (uint n)
{
uint x = 2 - n;
x *= 2 - x * n;
x *= 2 - x * n;
x *= 2 - x * n;
x *= 2 - x * n;
return x;
}
That's effectively five iterations of Newton-Raphson, in case anyone cares. ;-)