Profiling method performance

Question

I'm trying to profile some code, but I'm running into some issues with what I think is Java's compiler being smart and changing the way the code works.

example methods:

public int method1(int bits)
{
    // seed is a long object field
    seed = seed * 0x5DEECE66DL + 0xBL & (1L << 48) - 1;
    return (int) (seed >>> 48 - bits);
}

public int method2(int bits)
{
    // seed is a long object field
    seed *= 0x5DEECE66DL;
    seed += 0xBL & (1L << 48) - 1;
    return (int) (seed >>> 48 - bits);
}

public int method3(int bits)
{
        // multiplier, seeds, n, and carry are long/long array object fields
        final long t = multiplier * seeds[n] + carry;
        // carry = t / b (done in an unsigned way)
        final long div32 = t >>> 32; // div32 = t / (b+1)
        carry = div32 + ((t & 0xFFFFFFFFL) >= 0xFFFFFFFFL - div32 ? 1L : 0L);
        // seeds[n] = (b-1)-t%b (done in an unsigned way)
        seeds[n] = 0xFFFFFFFEL - (t & 0xFFFFFFFFL) - (carry - div32 << 32) - carry & 0xFFFFFFFFL;
        final long result = seeds[n];
        n = n + 1 & r - 1;
        return (int) (result >>> 32 - bits);
}

Now, to compare the performance, I'm using this test rig:

// gen is the object containing the methods/fields
int result;
long start = System.currentTimeMillis();
for (int i = 0; i < 0x7FFFFFFF; ++i)
{
    result = gen.method1(32);
}
long end = System.currentTimeMillis();
System.out.println(end - start);

start = System.currentTimeMillis();
for (int i = 0; i < 0x7FFFFFFF; ++i)
{
    result = gen.method2(32);
}
end = System.currentTimeMillis();
System.out.println(end - start);

start = System.currentTimeMillis();
for (int i = 0; i < 0x7FFFFFFF; ++i)
{
    result = gen.method3(32);
}
end = System.currentTimeMillis();
System.out.println(end - start);

However, the results I'm getting back are rather odd:

7
3109
13402

method1 and method2 should both have roughly the same computation time as they compute the exact same thing. method3 I can understand taking longer, but looking at the amount of calculations, none of the 3 results seem long enough to me. This seems to flag to me that somehow Java's optimizing my test rig and not running it all 0x7FFFFFFF times.

To fix this, I thought of declaring result as a static volatile field (outside of the test rig method), and doing that generated believable results:

21814
21468
26962

Now my question is is this the accepted way to profile methods? I would still like optimizations to occur in my methods I'm profiling, just not in the test rig (at least not optimizations which would result in the method being called less than the prescribed number of times or caching the results between runs).

issue javap -v on your class to see what the compiler does differently. I would think the difference would be an extra LLOAD and LSTORE op (done 0x7FFFFFFF times.) — MeBigFatGuy, Apr 26 '11 at 02:14
The only difference I saw was in the declaration line of result as either being just static or static and volatile. Diff: ` { -static int result; +static volatile int result; public static void main(java.lang.String[])` — helloworld922, Apr 26 '11 at 02:33

score 4 · Accepted Answer · edited May 23 '17 at 12:29

4

You should be using a profiler like YourKit or similar on your application as a whole. Focus on the problem areas and not the random number generator that is unlikely a bottleneck of any sort. Do you really have a use case that consumes more than 80 million per second? Just about anything else is going to take more time than these methods. You're almost certainly worrying about the wrong thing for the big picture.

If you really want to or have a good reason to continue this approach at least read this first: How do I write a correct micro-benchmark in Java? There are all sorts of factors that affect a micro-benchmark and translating them to the real world rarely matters or works. A profiler will tell you what you really need to know and the rest isn't worth spending much time on.

edited May 23 '17 at 12:29

Community

1
1

answered Apr 26 '11 at 02:35

WhiteFang34

70,765
18
106
111

I'm doing this more as a curiosity as to the raw performance of different generators, not necessarily what they perform like in addition to other code. Thanks for the links, I'll take a look at them. – helloworld922 Apr 26 '11 at 02:40
Curiosity killed the cat ;) You want a micro-benchmark if you're just curious about raw performance of generators. I'd just go with the Mersenne Twister and not worry much more about it though (unless you're using it for cryptographic purposes). – WhiteFang34 Apr 26 '11 at 02:52
good thing I'm a dog person :D I took some of those factors of a micro-benchmark into account and the results are much more believable now. – helloworld922 Apr 26 '11 at 03:01
Arf. Good to hear that helped. – WhiteFang34 Apr 26 '11 at 03:06

score 2 · Answer 2 · answered Apr 26 '11 at 02:07

2

I think this is the wrong approach. I don't think that the tests are meaningful predictors of what you'll experience in production if you try to circumvent the runtime optimizer. I'd recommend abandoning this avenue.

answered Apr 26 '11 at 02:07

duffymo

305,152
44
369
561

does it circumvent all optimization, or just optimization in the test rig? I just want to know what are the times it takes these particular methods to run as if they were called independently that many times. – helloworld922 Apr 26 '11 at 02:36

score 0 · Answer 3 · answered Apr 26 '11 at 08:14

You should perform the test in different orders as the JVM can otimise the code in different ways as the code runs. i.e. the compiler doe snext to no optimisations so you can run code several times in the same program and its performance can change, usually it improves.

Note: for short loops, where you discard the result, the JVM can determine you don't do anything useful and discard the loop. Instead of determining the loops runtime, you are determining how long it takes the JVM to discard the loop.

If you use volatile, this can be more expensive than what you are testing, which could be one reason the results are much the same when using voltile. ;)

A simple way to avoid the JVM optimising the loop to nothing is to add the results and the print the total (which you can ignore)

Profiling method performance

3 Answers3