[Short answer: Bad benchmarking methodology. You'd think I'd have figured that out by now.]
The problem is presented as "find a method for calculating x^y quickly, where x and y are positive integers". A typical "fast" algorithm looks like this:
public long fastPower(int x, int y) {
// Replaced my code with the "better" version described below,
// but this version isn't measurably faster than what I had before
long base = x; // otherwise, we may overflow at x *= x.
long result = y % 2 == 1 ? x : 1;
while (y > 1) {
base *= base;
y >>= 1;
if (y % 2 == 1) result *= base;
}
return result;
}
I wanted to see how much faster this is than say, calling Math.pow(), or using a naive approach like multiplying x by itself y times, like this:
public long naivePower(int x, int y) {
long result = 1;
for (int i = 0; i < y; i++) {
result *= x;
}
return result;
}
Edit: Okay, it's been pointed out to me (correctly) that my benchmarking code was not consuming the result, and that totally throws everything off. Once I started consuming the result, I'm still seeing the naive approach being about 25% faster than the "fast" approach.
Original text:
I was very surprised to find that the naive approach was 4x faster than the "fast" version, which was itself about 3x faster than the Math.pow() version.
My test is using 10,000,000 trials (and then 100 million, just to make absolutely sure the JIT has time to warm up), each using random values (to prevent calls from being optimized away) 2 <= x <= 3, and 25 <= y <= 29. I chose a narrow range of values that wouldn't produce a result greater than 2^63, but were biased with a larger exponent to attempt to give the "fast" version an advantage. I'm pre-generating 10,000 pseudorandom numbers to eliminate that part of the code from the timing.
I understand that for small exponents, the naive version could be expected to be faster. The "fast" version has two branches instead of one, and will generally do twice as many arithmetic/storage operations as the naive one - but I would expect for large exponents, that still would lead to the fast approach saving half as many operations in the best case, and being about the same in the worst case.
Anyone have any idea why the naive approach would be so much faster than the "fast" version, even with data biased toward the "fast" version (i.e. larger exponents)? Does the extra branch in that code account for that much of a difference at runtime?
Benchmarking code (yes I know I should use some framework for "official" benchmarks, but this is a toy problem) - updated to warm up, and consume results:
PowerIf[] powers = new PowerIf[] {
new EasyPower(), // just calls Math.pow() and cast to int
new NaivePower(),
new FastPower()
};
Random rand = new Random(0); // same seed for each run
int randCount = 10000;
int[] bases = new int[randCount];
int[] exponents = new int[randCount];
for (int i = 0; i < randCount; i++) {
bases[i] = 2 + rand.nextInt(2);
exponents[i] = 25 + rand.nextInt(5);
}
int count = 1000000000;
for (int trial = 0; trial < powers.length; trial++) {
long total = 0;
for (int i = 0; i < count; i++) { // warm up
final int x = bases[i % randCount];
final int y = exponents[i % randCount];
total += powers[trial].power(x, y);
}
long start = System.currentTimeMillis();
for (int i = 0; i < count; i++) {
final int x = bases[i % randCount];
final int y = exponents[i % randCount];
total += powers[trial].power(x, y);
}
long end = System.currentTimeMillis();
System.out.printf("%25s: %d ms%n", powers[trial].toString(), (end - start));
System.out.println(total);
}
Produces output:
EasyPower: 7908 ms -407261252961037760 NaivePower: 1993 ms -407261252961037760 FastPower: 2394 ms -407261252961037760
Playing with the parameters for the random numbers and the trials does change the output characteristics, but the ratios between the tests are always consistently the same as those shown.