6

We have an OSGi container with a lot of products running inside, one of them being our product.
We have some performance tests running and there is this weird problem that, every OSGi container restart will result in a performance deviation for some of our tests up to 400%.
Through some testing and things I was able to track this down to this method:

public static Method getMethodForSlotKey(Class<?> cls, String slotKey, String methodType) {

    Method[] methods = cls.getMethods();

    if (methods != null && methods.length > 0) {
        for (Method method : methods) {
            String methName = method.getName();
            if (methName.startsWith(methodType)) {

                IDataAnnotation annot = method.getAnnotation(IDataAnnotation.class);
                if (annot != null) {
                    String annotSlotKey = annot.SlotKey();
                    if (annotSlotKey != null && annotSlotKey.equals(slotKey)) {
                        Class<?>[] paramTypes = method.getParameterTypes();
                        // for now, check length == 1 for setter and 0 for getter.
                        int len = SET_TXT.equals(methodType) ? 1 : 0;
                        if (paramTypes != null && paramTypes.length == len) {
                            return method;
                        }
                    }
                }
            }
        }
    }
    return null;
}

This method mainly does reflection and string comparison.

Now, what I did is to cache the results of this method and instantly our deviation goes down to 10 - 20%. Of course, this method is called often, so that there is am improvement is obvious.

Still I don't understand why the non-cached version has such a high deviation with the only difference being a OSGi / JVM restart? What exactly may happen during the restart? Are there any known performance issues for different classloaders for instance? Is it possible that in the OSGi environment libraries will be loaded in a different order between the restarts?

I am searching for a clue here for this to make sense.

UPDATE

It turns out that this call:

Method[] methods = cls.getMethods();

is causing the deviation. I still don't understand it, so if anyone does I'd be happy to hear about it.

sveri
  • 1,372
  • 1
  • 13
  • 28
  • 2
    Some reflection based calls are re-compiled and optimised by the JVM if the function is called 15000 times. After that, they get much more effective. Try to call your function 15 thousand times and see if the performance goes up. If it does, this is the case. The number after code is re-compiled can be configured in the JVM, but I do not remember the parameter names. Also you can set in the JVM to log out these re-compilation tasks. – Balazs Zsoldos Jan 13 '15 at 20:19
  • Hey, I did not find this option, maybe you can remember the exact name? – sveri Jan 16 '15 at 12:53
  • http://artiomg.blogspot.hu/2011/10/just-in-time-compiler-jit-in-hotspot.html. I guess you can set the number of function calls with -XX:CompileThreshold=XXX and print out the compilation and statistics with "XX:+PrintCompilation" and "-XX:-CITime". If this is the answer, I will answer it properly and take the bounty ;-) – Balazs Zsoldos Jan 16 '15 at 13:03
  • I just tried it with adding: -XX:CompileThreshold=500. Doesn't make a difference, still have the deviation of up to 400%. – sveri Jan 16 '15 at 15:21
  • "OSGi container restart will result in a performance deviation": This means that if the OSGi container runs a while (and code runs on it), the tests are four times faster? I thought you meant that. If that is the case, I cannot think anything else but JIT optimizations. Do you experience the same outside of OSGi? – Balazs Zsoldos Jan 16 '15 at 16:39
  • The OSGi container is encapsulated by a server, as long as the server is up, timings are stable, but as soon as I restart the server (OSGi container) the timing will change and remain stable then again: Start: cont. respond within 1s Restart: cont. respond within 3s Restart: cont. respond within 4s Restart: cont. respond within 2.5s... – sveri Jan 16 '15 at 19:10
  • Does the cls parameter contain different classes. If you look into the source code of Class, you will see that during the first call of getMethods() all the reflection data is generated. Next time a cached version is returned. That means that the first call for each type is way much slower than the second. See the source of privateGetPublicMethods() of Class. – Balazs Zsoldos Jan 16 '15 at 20:43
  • Can you provide an example project at an opensource hub (like github)? I would be really interested to profile this one. – Balazs Zsoldos Jan 19 '15 at 23:11
  • Sorry, I cannot, this is proprietary business stuff and I am not allowed to disclose more of it. And worse, I have not found a way to reproduce it in a very small code base myself. – sveri Jan 20 '15 at 08:22

2 Answers2

1

It seems like this is another case of the intricate JIT, which seems to constantly change and evolve with time. This answer may shed some light on your issue: Java: what is JITC's reflection inflation?

I believe you are observing the aftereffects of how JIT optimizes your reflection calls. For a while (the first X invocations) would be slow until JIT decides to optimize your method.

Something you can try - to test this theory - is to modify your code and put the offending call in a dummy loop and execute it a number of times on the very first call (make sure you do something with the result from the call to force java to not optimize away the call completely, if it figures it has no consequences). If this results in a very long execution time for the first use and then very low variance on the subsequent calls, then the theory sounds correct.

Community
  • 1
  • 1
xpa1492
  • 1,953
  • 1
  • 10
  • 19
  • Hm, that is basically what is happening during the warmup. Before we run the tests we can observer exactly this behavior. Calls taking a relativley long time, but then, after they were executed for some time they stabilize on some average. However, does the JIT optimize methods differently after each JVM restart so that the response times do differ after each JVM restart? – sveri Jan 26 '15 at 08:28
  • 1
    It is quite possible that the JVM optimize things differently - in theory it gathers statistics based on how methods are called and then decides how to optimize the code according to these statistics. If the frequency of execution or parameters passed it vary after JVM restarts, the optimization may be different. This is theory at least, in practice this optimization process seems to vary way too much across JVMs, JVM settings and platforms... – xpa1492 Jan 26 '15 at 08:45
  • Again, the setup was always the same and the number of method calls was exactly the same too, so I would suppose the call order of the methods is the same too, but, I did not check call order, so that might be one difference. – sveri Jan 26 '15 at 14:52
0

I suspect it is because your reboot of your server / OSGi container is clearing the cached Method[]* in the class(es) you are interrogating. The cache will be cleared to ensure correct operation in the situation where the reloaded class has a different public API (i.e. different methods) than the old one with the same name (this could happen e.g. if you load a newer version)

When you implement your cache of your 'wrapper' method, you are circumventing this and introducing an assumption - that the reloaded classes will have the same methods that the previously loaded classes with the same name had. Probably quite a safe assumption for most cases, but not one you would want core Java to make.

The source code for the (OpenJDK) Class class is here. If you look through it, you will see that getMethods() calls privateGetPublicMethods(). If the cached publicMethods array* is null, or is made null by clearCachesOnClassRedfinition(), then the class will have to go through the relatively intensive process of finding all the public methods in the class and any super classes it inherits from and interfaces it implements. This is then cached and the speed of getMethods() will return to normal.


*strictly, the cached value is a soft reference to an array.

J Richard Snape
  • 20,116
  • 5
  • 51
  • 79
  • Hi, maybe I did not make it clear enough and should update my question, but the response times remain stable after an OSGi restart. So, the speed of getMethods does not change during the runtime of the environment. The response time will only deviate between several restarts, but then remain stable during the uptime, that's what drives me so crazy. – sveri Jan 19 '15 at 14:36
  • Oh. Got it now, that is wierd. I'll think about it – J Richard Snape Jan 19 '15 at 14:49
  • Sorry for asking the stupid question, I'm sure you've thought about this. From the code calling your method - is it possible that it gets called far more times if there is a different order of, say, library loading or thread starting? Can you count the calls of the method somehow (if you haven't already). – J Richard Snape Jan 19 '15 at 16:17
  • There are no stupid questions, so, yea, I did count how often the method is called and it was exactly the same, couldn't believe it myself, but, it is correct from a programmatic point of view. – sveri Jan 19 '15 at 22:13
  • It's puzzling. Just had one last thought. Logically, I think it must be cache related. I wonder whether `clearCachesOnClassRedfinition()` is clearing the cache every/most times it's called from `Class.privateGetPublicMethods()`. I haven't a concrete reason why, but it would be along the lines that for some particular OSGi boot sequences, the JVM is marking the classes redefined regularly, while for others it isn't. To test towards this: to time each call of `cls.getMethod()` and output to a log. Check that on the faster runs, caching effect appears, slower ones it doesn't. If so, we've a clue – J Richard Snape Jan 22 '15 at 10:46