First and foremost, your benchmark lacks any warmup. Here is some more realistic benchmark code:
public class Test
{
public static void main(String[] args) {
final Random rnd = new Random();
for (int i = 0; i < 20; i++) {
final double[] values = new double[10000];
for (int j = 0; j < values.length; j++)
values[j] = rnd.nextDouble();
testPow(values);
}
}
private static void testPow(double[] values) {
final long start = System.nanoTime();
double sum = 0;
for (int i = 0; i < values.length; i++)
sum += Math.pow(values[i], 2);
final long elapsed = System.nanoTime() - start;
System.out.println("Sum: " + sum + "; Time elapse :: " + TimeUnit.NANOSECONDS.toMillis(elapsed));
}
}
I ran it on Java 8 and it gave 9 ms for one million elements. On Java 6 it gave 62 milliseconds. (sorry, I have no JDK 7 installed.)
On the other hand, if I don't initialize the array (like you didn't), then I get 0.9 ms for million elements, whereas Java 6 stays the same.
To reproduce your observation just take the results from the first run, then compare them to the improvement that happens later on. The smaller the array, the more repetition it takes to reach the JIT compiler's theshold for the number of iterations over the innermost loop before it triggers a given optimization.