5

After updating the environment from Wildfly 13 to Wildfly 18.0.1 we experienced an

A channel event listener threw an exception: java.lang.OutOfMemoryError: Direct buffer memory
at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118)
at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317)
at org.jboss.xnio@3.7.3.Final//org.xnio.BufferAllocator$2.allocate(BufferAllocator.java:57)
at org.jboss.xnio@3.7.3.Final//org.xnio.BufferAllocator$2.allocate(BufferAllocator.java:55)
at org.jboss.xnio@3.7.3.Final//org.xnio.ByteBufferSlicePool.allocateSlices(ByteBufferSlicePool.java:162)
at org.jboss.xnio@3.7.3.Final//org.xnio.ByteBufferSlicePool.allocate(ByteBufferSlicePool.java:149)
at io.undertow.core@2.0.27.Final//io.undertow.server.XnioByteBufferPool.allocate(XnioByteBufferPool.java:53)
at io.undertow.core@2.0.27.Final//io.undertow.server.protocol.http.HttpReadListener.handleEventWithNoRunningRequest(HttpReadListener.java:147)
at io.undertow.core@2.0.27.Final//io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:136)
at io.undertow.core@2.0.27.Final//io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:59)
at org.jboss.xnio@3.7.3.Final//org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
at org.jboss.xnio@3.7.3.Final//org.xnio.conduits.ReadReadyHandler$ChannelListenerHandler.readReady(ReadReadyHandler.java:66)
at org.jboss.xnio.nio@3.7.3.Final//org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:89)
at org.jboss.xnio.nio@3.7.3.Final//org.xnio.nio.WorkerThread.run(WorkerThread.java:591)

Nothing was changed on the application side. I looked at the Buffer pools and it seems that some resources are not freed. I triggered several manual GCs but nearly nothing happens. (Uptime 2h)

Buffer Pool with steadily growing

Before in the old configuration it looked like this (Uptime >250h):

Buffer Pool which is correctly freed

Now I did a lot of research and the closest thing I could find is this post here on SO. However this was in combination with websockets but there are no websockets in use. I read several (good) articles (1,2,3,4,5,6) and watched this video about the topic. The following things I tried but nothing had any effect:

  1. The OutOfMemoryError occurred at 5GB since the heap is 5GB => I reduced the MaxDirectMemorySize to 512m and then 64m but then the OOM just occurs quicker
  2. I set -Djdk.nio.maxCachedBufferSize=262144
  3. I checked the number of IO workers: 96 (6cpus*16) which seems reasonable. The system has usually short lived threads (largest pool size was 13). So it could not be the number of workers I guess
  4. I switched back to ParallelGC since this was default in Java8. Now when doing a manual GC at least 10MB are freed. For GC1 nothing happens at all. But still both GCs cannot clean up.
  5. I removed the <websockets> from the wildfly configuration just to be sure
  6. I tried to emulate it locally but failed.
  7. I analyzed the heap using eclipseMAT and JXRay but it just points to some internal wildfly classes. JXRay showing 1544 objects referenced as DirectByteBuffers
  8. I reverted Java back to version 8 and the system still shows the same behavior thus wildfly is the most probable suspect.

In eclipseMAT one could also find these 1544 objects. They all got the same size. eclipseMAT showing the DirectByteBuffer references

The only thing what did work was to deactivate the bytebuffers in wildfly completely.

/subsystem=io/buffer-pool=default:write-attribute(name=direct-buffers,value=false)

However from what I read this has a performance drawback?

So does anyone know what the problem is? Any hints for additional settings / tweaks? Or was there a known Wildfly or JVM bug related to this?

Update 1: Regarding the IO threads - maybe the concept is not 100% clear to me. Because there is the ioThreads value enter image description here And there are the threads and thread pools. enter image description here

From the definition one could think that per worker thread the number of ioThreads is created (in my case 12)? But still the number of threads / workers seems quite low in my case...

Update 2: I downgraded java and it still shows the same behavior. Thus I suspect wildfly to be cause of the problem.

Lonzak
  • 9,334
  • 5
  • 57
  • 88
  • can you roll back one of the version upgrades to see if you can reduce the "pool of suspects"? – Gus Oct 29 '21 at 14:38
  • @Gus That is one idea I also had. I'll try that and report back. – Lonzak Oct 29 '21 at 15:11
  • There's a discussion from an older version of WildFly here https://stackoverflow.com/questions/63519501/outofmemoryerror-direct-buffer-memory-when-using-websockets-in-wildfly; might be relevant since it looks like they didn't fix a "bug", just recommended a config change (reduce io worker thread count). – Gus Oct 29 '21 at 16:42
  • DirectByteBuffers created in Java have their native memory cleaned up when the Buffer object becomes unreachable (using Cleaners). The fact that this doesn't happen, up until an OOME happens, seems to imply a resource leak: something is holding on to the Buffer objects meaning the backing native memory can never be cleaned up. FWIW, this doesn't look like a JVM bug to me. More likely either you have to do something to explicitly release the resource (like call close() somewhere, or remove a global reference to something that indirectly references the buffers). – Jorn Vernee Oct 29 '21 at 19:05
  • FWIW, depending on how you do them, and the used GC, manual GCs might have no effect because the GC doesn't 'see' the native memory that is attached to the buffer. Usually you have to increase pressure on the Java heap first (e.g. by allocating large arrays in a loop), and _then_ do a manual GC to trigger the cleanup. – Jorn Vernee Oct 29 '21 at 19:10
  • @Gus I already mentioned that link inside my post - this is indeed the closest thing I could find to my problem. (And I am using the exact same wildfly version mentioned there) However reducing the number of workers didn't work for me since our number is quite low from the start. But maybe the concept of the workers in relation to IO threads is not fully clear to me. I'll update my post... – Lonzak Nov 01 '21 at 08:49
  • @JornVernee Thank you for your answer. I also suspect a potential leak however I don't know how to identify the class involved. In the OOM error Stacktrace the HttpReadListener is mentioned and in the JXRay screenshot only some internal wildfly classes - so it is difficult to release that unknown resource... – Lonzak Nov 01 '21 at 13:21
  • I'm not sure JXRay supports doing this, though I'd expect so, but you could try to go over the list of buffers in that cache and see what other things are referencing those buffers. FWIW, at least looking at the ByteBufferSlicePool.java here: https://github.com/xnio/xnio/blob/2769e40ff89150fd46b776c05a2d276c0acc6ece/api/src/main/java/org/xnio/ByteBufferSlicePool.java I can see things being added to that `directBuffers` list, but never removed, so they would stay around until the pool is GC'd. – Jorn Vernee Nov 01 '21 at 21:50
  • @Gus I reverted the java back to version 8 and it still shows the same behavior. Thus most probably wildfly 18.0.1 is causing the problems. – Lonzak Nov 04 '21 at 19:06
  • Check whether there is any open files which should be closed of that process (via lsof command). Also check your source code for potential memory leak: input stream, resource not closed properly. – mystery Nov 08 '21 at 12:10

2 Answers2

1

After lots of analyzing, profiling etc. I draw the following conclusion:

  • The cause of the OOM is caused by wildfly in version 18.0.1. It also exists in 19.1.0 (did not test 20 or 21)
  • I was able to trigger the OOM fairly quickly when setting the -XX:MaxDirectMemorySize to values like 512m or lower. I think many people don't experience the problem since by default this value is equals the Xmx value which can be quite big. The problems occurs when using the ReST API of our application
  • As Evgeny indicated XNIO is a high potential candidate since when profiling it narrowed down to (or near) that area...
  • I didn't have the time to investigate further so I tried wildfly 22 and there it is working. This version is using the latest xnio package (3.8.4)
  • the DirectMemory remains quite low in WF 22. It is around 10mb. One can see the count rising and falling which wasn't the case before

So the final fix is to update to wildfly version 22.0.1 (or higher)

Lonzak
  • 9,334
  • 5
  • 57
  • 88
0

Probably its a Xnio problem. Look at this issue https://issues.redhat.com/browse/JBEAP-728

  • I browsed through all the wildfly change logs however didn't find anything related. I didn't think of also checking jboss EAP. However the bug is from 2015 and did not happen in wildfly 13.0.1 (May 31, 2018). But now in wildfly 18.0.1 (from November 14, 2019) it does suddenly appear? – Lonzak Nov 05 '21 at 14:31