8

I have a server running which crashed multiple times by getting stuck in Unsafe.defineAnonymousClass while trying to create a lambda instance. Here is an example from one of the stack traces:

"MessageIterationThread (false, 1059)" #163386 prio=5 os_prio=0 cpu=56.89ms elapsed=28566.80s tid=0x00007fef183bb000 nid=0x78f6 runnable  [0x00007fece0693000]
   java.lang.Thread.State: RUNNABLE
    at jdk.internal.misc.Unsafe.defineAnonymousClass0(java.base@11.0.2/Native Method)
    at jdk.internal.misc.Unsafe.defineAnonymousClass(java.base@11.0.2/Unsafe.java:1223)
    at java.lang.invoke.InnerClassLambdaMetafactory.spinInnerClass(java.base@11.0.2/InnerClassLambdaMetafactory.java:320)
    at java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite(java.base@11.0.2/InnerClassLambdaMetafactory.java:188)
    at java.lang.invoke.LambdaMetafactory.metafactory(java.base@11.0.2/LambdaMetafactory.java:329)
    at java.lang.invoke.LambdaForm$DMH/0x00000008002e6840.invokeStatic(java.base@11.0.2/LambdaForm$DMH)
    at java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@11.0.2/Invokers$Holder)
    at java.lang.invoke.BootstrapMethodInvoker.invoke(java.base@11.0.2/BootstrapMethodInvoker.java:127)
    at java.lang.invoke.CallSite.makeSite(java.base@11.0.2/CallSite.java:307)
    at java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(java.base@11.0.2/MethodHandleNatives.java:258)
    at java.lang.invoke.MethodHandleNatives.linkCallSite(java.base@11.0.2/MethodHandleNatives.java:248)
    at some random point in my code using this::someMethod

Crashing meaning a thread got stuck at a point where he blocked others and caused a deadlock like scenario. The point where the lamda creation gets stuck seems to be completly random. There are even multiple threads getting stuck at completly different points in the code. At least one of them compromising the whole server.

I found this post which looked like a problem which could be related and ensured that this was not the problem. Java8 hangs up if getStackTrace() is called by one thread and lambda definition (via Unsafe.defineAnonymousClass) occurs in another thread

The server runs on an Ubuntu 18.04.4 LTS and the JVM is OpenJDK 64-Bit Server VM (11.0.2+9 mixed mode). The code is compiled with the same OpenJDK. These are the starting parameters:

-Xmx4000m 
-Xms4000m 
-XX:NewSize=128m 
-Xss400k
-XX:CMSInitiatingOccupancyFraction=70 
-XX:+UseConcMarkSweepGC 
-XX:+CMSParallelRemarkEnabled 
-XX:-OmitStackTraceInFastThrow 
-XX:MaxJavaStackTraceDepth=0 
-XX:PermSize=200m 
-XX:MaxPermSize=512m 
-XX:ReservedCodeCacheSize=256m  

FunFact: There is another instance of the server running on the same physical machine with a way higher usage which seems to not have this problem. The server with the problem is the testing environment with less traffic but much more frequent restarts.

I would be happy for suggestions what could possibly get this methode call stuck or explanations about what this method does in the native part.

Turmfalke
  • 81
  • 4
  • I think code would help, including the lambda itself and related variables. It's tough for anyone to tell you what might be wrong with code, without the code. In this case, it seems it's trying to build an anonymous class somewhere, in the lambda itself or in a method invoked from the lambda. HTH – TonyG Jul 06 '20 at 15:17
  • 1
    @TonyG That's the method reference. Lambda expressions and method references are implemented by creating anonymous classes at runtime. – HTNW Jul 06 '20 at 15:19
  • @TonyG I would love to help with more code. The problem I see is that every Lambda, Method reference or perhaps even call to Unsafe.defineAnonymousClass seems to be able to cause this issue. So I am not sure what value a specific code piece would add. I choose this part of a trace because it was the root of a locking and certifieable part of the hole problem. But I have traces from threads which do not even have custom code in their stack trace and were stuck for some time. If they would help you I can add them. – Turmfalke Jul 06 '20 at 15:47
  • You guys are right. I misread. I think the underlying issue would be with lock contention, a threading issue, or with some other low-level resource. Note that it's getting stuck in defineAnonymousClass0 which returns a native object ... whatever that should be. It seems the issue is down further where the stack trace can't report. The flag CMSInitiatingOccupancyFraction seems suspect to me, playing with GC and thus lower-level resources, but I'm stretching now. The water is too deep for me, sorry, but I HTH. Good luck. – TonyG Jul 06 '20 at 16:10
  • It might help to mention the OpenJdk JVM version this is running on, and perhaps the compiler version this has been built with. Have you tried to reproduce the problem on a different machine, perhaps with a simulated/recorded workload? – Hulk Jul 07 '20 at 06:30
  • @Hulk I added the Infos. There were no tries to reproduce this problem on a different machine. I may try to do it at a later point, but at the moment this is not at the top of my priorities because of the work involved and me not being clear about what it would contribute. – Turmfalke Jul 07 '20 at 13:33

0 Answers0