0

We have a production web server built with IBM Websphere, Spring IoC, Spring MVC, and Hazelcast. We use Hazelcast as the Spring session implementation. A single schedule thread will do the health check every 60 seconds. During a large excel exporting job, many OutOfMemoryErrors has thrown. But after the job is completed with success, most requests to the server are failed with a stack trace like this.

[LargeThreadPool-thread-6481] ERROR xxxxFilter Exception captured in filter scope: org.springframework.web.util.NestedServletException: Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: Java Heap space
  at org.springframework.web.DispatcherServlet.doDispatch(DispatcherServlet.java:982)
  at org.springframework.web.DispatcherServlet.doService(DispatcherServlet.java:901)
  xxx
  spring filter stack
  xxx
  IBM Websphere stack
Caused by: java.lang.OutOfMemroyError: Java heap space
  at java.lang.Class.getDeclaredFieldsImpl
  at java.lang.Class.getDeclaredFileds
  at java.io.ObjectStreamClass.getDefaultSerialFileds
  xxx
  at com.hazelcast.client.impl.proxy.ClientMapProxy.setAsync
  xxx
  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFuturedTask.run
  

Websphere dumped 4 phd and java core files during the job execution, no more dump after the job was completed.

I tried to debug the code of hazelcast, they did catch the OOME to do some client lifecycle management but did not rethrow it to somewhere another thread can access it.

My questions: The schedule thread should be dead after OOME happened. How come the thread execution can repeatably be the cause of another thread?

NYoung
  • 335
  • 3
  • 12

2 Answers2

0

there are two possibilities,

  1. Repeat mechanism in code.
  2. Scheduler works multiple times and gets outofmemory sometimes and not every time, it depends to load of the application.

Easiest solution would be increase the heap https://stackoverflow.com/a/69348545/175554 better solution would be change the scheduler code to handle data not getting outofmemory, paging and processing block by block.

ozkanpakdil
  • 3,199
  • 31
  • 48
0

Heap is used by the whole process, not allocated to any particular thread. If any thread exhausts the heap, everything is affected.

OOME means you don't have a large enough heap for the needs at that point in time. It could be that there's a leak somewhere that is never giving back memory it needs to, or it could be that you simply need a larger heap for this application's load.

Or it could be both. That is, that you have peak needs for a larger heap, but you also, say, forgot to close some resource that is holding onto that memory, thus not allowing it to be garbage collected.

There are WebSphere heap analysis tools you can use to try to determine which case it is, or you can open a Support Case with IBM if you have Support, and have them start that analysis and point you in the right direction.

Exporting an Excel file it would not surprise me if the heap requirements are just quite large. We have an application where we use a 3rd-party library to work with Excel files as well, and it can definitely be memory-intensive. Depending on your library (or custom code), there could be ways to make that more memory efficient - although if so likely at the cost of speed. But I'd also definitely look for like failure to close some opened resource(s).

Finally, I'm kind-of surprised Hazelcast would be in the mix for this. It's definitely not a good practice to put large items on the HttpSession.

dbreaux
  • 4,982
  • 1
  • 25
  • 64