0

I work on a old Java 1.6 application which use GWT 2.4.0 Since few weeks, it started to show a lot of OutOfMemoryError, and I can't find why... I can't do a lot of modification on this old app, I can't migrate to JDK 1.8, or migrate to a newer GWT version, too much risky.

Example of stacktrace :

2021-01-27 09:40:39,937 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[localhost].[/borneIntranet]] Exception while dispatching incoming RPC call
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2734)
        at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
        at java.util.ArrayList.add(ArrayList.java:351)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.append(ServerSerializationStreamWriter.java:583)
        at com.google.gwt.user.client.rpc.impl.AbstractSerializationStreamWriter.writeInt(AbstractSerializationStreamWriter.java:100)
        at com.google.gwt.user.client.rpc.impl.AbstractSerializationStreamWriter.writeString(AbstractSerializationStreamWriter.java:134)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter$ValueWriter$10.write(ServerSerializationStreamWriter.java:165)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serializeValue(ServerSerializationStreamWriter.java:536)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serializeClass(ServerSerializationStreamWriter.java:709)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serializeImpl(ServerSerializationStreamWriter.java:748)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serialize(ServerSerializationStreamWriter.java:621)
        at com.google.gwt.user.client.rpc.impl.AbstractSerializationStreamWriter.writeObject(AbstractSerializationStreamWriter.java:126)
        at com.google.gwt.user.client.rpc.core.java.util.Collection_CustomFieldSerializerBase.serialize(Collection_CustomFieldSerializerBase.java:44)
        at com.google.gwt.user.client.rpc.core.java.util.ArrayList_CustomFieldSerializer.serialize(ArrayList_CustomFieldSerializer.java:39)
        at com.google.gwt.user.client.rpc.core.java.util.ArrayList_CustomFieldSerializer.serializeInstance(ArrayList_CustomFieldSerializer.java:51)
        at com.google.gwt.user.client.rpc.core.java.util.ArrayList_CustomFieldSerializer.serializeInstance(ArrayList_CustomFieldSerializer.java:28)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serializeImpl(ServerSerializationStreamWriter.java:740)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serialize(ServerSerializationStreamWriter.java:621)
        at com.google.gwt.user.client.rpc.impl.AbstractSerializationStreamWriter.writeObject(AbstractSerializationStreamWriter.java:126)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter$ValueWriter$8.write(ServerSerializationStreamWriter.java:153)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serializeValue(ServerSerializationStreamWriter.java:539)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serializeClass(ServerSerializationStreamWriter.java:709)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serializeImpl(ServerSerializationStreamWriter.java:748)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serialize(ServerSerializationStreamWriter.java:621)
        at com.google.gwt.user.client.rpc.impl.AbstractSerializationStreamWriter.writeObject(AbstractSerializationStreamWriter.java:126)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter$ValueWriter$8.write(ServerSerializationStreamWriter.java:153)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serializeValue(ServerSerializationStreamWriter.java:539)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serializeClass(ServerSerializationStreamWriter.java:709)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serializeImpl(ServerSerializationStreamWriter.java:748)
        at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serialize(ServerSerializationStreamWriter.java:621)
        at com.google.gwt.user.client.rpc.impl.AbstractSerializationStreamWriter.writeObject(AbstractSerializationStreamWriter.java:126)
        at com.google.gwt.user.client.rpc.core.java.util.Collection_CustomFieldSerializerBase.serialize(Collection_CustomFieldSerializerBase.java:44)
2021-01-27 09:40:41,594 ERROR [STDERR] 8312399 [scheduler_QuartzSchedulerThread] ERROR org.quartz.core.ErrorLogger - An error occurred while scanning for the next triggers to fire.
2021-01-27 09:40:42,206 ERROR [STDERR] org.quartz.JobPersistenceException: Failed to obtain DB connection from data source 'springNonTxDataSource.scheduler': java.sql.SQLException: Exception d'E/S: Connection reset [See nested exception:$
2021-01-27 09:40:42,208 ERROR [STDERR]  at org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConnection(JobStoreCMT.java:168)
2021-01-27 09:40:42,208 ERROR [STDERR]  at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3784)
2021-01-27 09:40:42,208 ERROR [STDERR]  at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTriggers(JobStoreSupport.java:2756)
2021-01-27 09:40:42,208 ERROR [STDERR]  at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:272)
2021-01-27 09:40:42,208 ERROR [STDERR] Caused by: java.sql.SQLException: Exception d'E/S: Connection reset
2021-01-27 09:40:42,208 ERROR [STDERR]  at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:111)
2021-01-27 09:40:42,208 ERROR [STDERR]  at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:145)
2021-01-27 09:40:42,209 ERROR [STDERR]  at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:254)
2021-01-27 09:40:42,209 ERROR [STDERR]  at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:386)
2021-01-27 09:40:42,209 ERROR [STDERR]  at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:413)
2021-01-27 09:40:42,209 ERROR [STDERR]  at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:164)
2021-01-27 09:40:42,209 ERROR [STDERR]  at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:34)
2021-01-27 09:40:42,209 ERROR [STDERR]  at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:752)
2021-01-27 09:40:42,209 ERROR [STDERR]  at org.apache.commons.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38)
2021-01-27 09:40:42,209 ERROR [STDERR]  at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582)
2021-01-27 09:40:42,209 ERROR [STDERR]  at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1188)
2021-01-27 09:40:42,209 ERROR [STDERR]  at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
2021-01-27 09:40:42,209 ERROR [STDERR]  at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
2021-01-27 09:40:42,209 ERROR [STDERR]  at org.springframework.scheduling.quartz.LocalDataSourceJobStore$2.getConnection(LocalDataSourceJobStore.java:134)
2021-01-27 09:40:42,209 ERROR [STDERR]  at org.quartz.utils.DBConnectionManager.getConnection(DBConnectionManager.java:108)
2021-01-27 09:40:42,209 ERROR [STDERR]  at org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConnection(JobStoreCMT.java:165)
2021-01-27 09:40:42,209 ERROR [STDERR]  ... 3 more

I try this :

What is weird, is when I watch JMX stats, the heapspace is 25% used, and permspace is 40% used. But in server.log, I see the outOfMemoryError, so how it's possible ?

The server have 5120M of RAM.

The actual launch parameters are :

/opt/jdk/bin/java -Dprogram.name=run.sh -server -Xms2560m -Xmx2560m -XX:MaxPermSize=768M 
-Djava.awt.headless=true -Xdebug -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n 
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9999 
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false 
-XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/
-Djava.net.preferIPv4Stack=true ...

I don't know how to process. I don't have much application log either, so I can't know what users really do, like display a GWT screen which use a lot of data, or something else.

EDIT :

Here is some gc log files, each generated in a day, until the app crashed with an OOM.

https://file.io/e1140Wc80yhd

https://file.io/8jRxM1KIgPPB

https://file.io/WvUgIX3ftQ1o

user2178964
  • 124
  • 6
  • 16
  • 40
  • 1
    there are some DB exceptions, it surely looks like someone is reading a big chunk of data from it. – Eugene Jan 27 '21 at 20:55
  • Thx, but I can't make an AWR report on the database (no snapshot), and the JMX stats of the database server are ok at time of outOfMemory... – user2178964 Jan 28 '21 at 09:23
  • It tries to allocate memory but can't, that's why it throws an OOME. That the heap currently is 25% used means nothing. My guess would be, as Eugene says, that there is a very big chunk of data being serialised and there isn't heap space enough for it. – Erik Jan 28 '21 at 13:58
  • Thank you. I have an automatic restart when a OOME happen, in this case is it necessary ? Because it use 25%, so if the allocation could not happen, it's been blocked, and the rest of application could continue to live no ? How search the "big chunk" of data ? I can log every sql request (hibernate show_sql) but it will flood the logs, and slow the production I think... – user2178964 Jan 28 '21 at 15:56
  • No, the application cannot live on. There is no way known to me to unrequest a memory request. While you can catch the Error and continue to operate, that is not advisable. Do you have any logs on the db side? – Erik Jan 28 '21 at 16:13
  • It seems that is a classic Oracle instance with only Statspack installed. I managed to get a report of today requests, but not at time of error yesterday. Because we have only snapshot of current day, weird. I ask to DBA to change that to keep the 7 days snapshot. – user2178964 Jan 28 '21 at 16:33
  • did you solve this? i would like to get my hands wet in this. – Bishal Gautam May 03 '21 at 18:09

2 Answers2

1

Nice puzzle you got there :-)

I've got some suggestions to try for you as I suppose you are the only one that can reproduce this (if you are the only one that has the code and the data that the code is operating on):

  1. Instead of the JMX bean, I would use GC logging because the information is realtime, directly from your Garbage Collector, see: https://gceasy.io for setup and analysis. E.g.: -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<GC-log-file-path>
  2. Try to increase -Xmx (instead of MaxPermSize) your way out of this one (at some point that is only going to succeed on a 64-bit instance which might cost extra memory, use compressed oops to avoid that: https://www.baeldung.com/jvm-compressed-oops), but stay clear from java having to use swap, and don't forget Java also needs memory for every Threads stacks and native code, on top of the Java heap.
  3. You could attach a Java debugger (e.g. Eclipse) to port 8787 and create an Exception breakpoint that catches java.lang.OutOfMemoryError and inspect the stack for the relevant values, to find that large dataset or circular reference?). WARNING: running Java with debugger breakpoints might slow down your application so it might not be a great idea for production environments, but on the other hand, making heap dumps isn't good for your applications response times either.
  4. You could load the heapdump from -XX:+HeapDumpOnOutOfMemoryError in (j)visualvm for analysis, should be easier than jhat but that might work too). Only dumps made with HeapDumpOnOutOfMemoryError contain all the information up to the point of the OOMError. Do not share the heapdump on the internet, every byte of private information (personal data, passwords, maybe bytecode from classes), becomes public information 1, Finally tracing (JDBC, network, statspack) of queries might help as well but may not be worth the hassle as it might be too much info to easily analyse

If you need more help try adding the GC logging and the output at startup time of: -XX:+PrintFlagsFinal to this post.

JohannesB
  • 2,214
  • 1
  • 11
  • 18
  • Thanks a lot for your answer. I already add the GC log for some weeks now. I'm gonna add example of GC log generated just before the crash. – user2178964 Apr 28 '21 at 21:32
  • I added the files ;) – user2178964 Apr 28 '21 at 22:02
  • Analysis of: `gc-*-2021-04-11-Crash_10h29.log` shows (as speculated) a very sudden increase in memory usage (see A & P graph): https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMDQvMjkvLS1nYy13ZWxkb21fY29tbWFuZGVzLTIwMjEtMDQtMTEtQ3Jhc2hfMTBoMjkubG9nLS02LTQyLTI5&channel=WEB Try analyzing the heapdump with visualvm / jvisualvm and if computing retained sizes take note of: https://stackoverflow.com/questions/8316002 – JohannesB Apr 29 '21 at 07:44
  • I disable automatic heap dump because they takes too much disk size. Everytime I analyse them, the objects which saturate the memory were string, char. They match data from my database. – user2178964 Apr 29 '21 at 08:09
  • But the difficult point is to understand why there is so much of these data suddenly in memory. Analyzing the code doesn’t help, too much complexity everywhere. I need to identify which gwt screen or action is launched but I don’t want to add log.debug everywhere and enable it in production… – user2178964 Apr 29 '21 at 08:11
  • Those objects form a graph starting at the GC Root (e.g.: stack frames of the request thread) towards the strings, maybe you can find the most import information at the beginning of the graph. I'm no GWT expert but maybe some info can be glanced from lower in the stack, like an HttpServletRequest object, although the ?partial? stacktrace of this question does not tell anything about that – JohannesB Apr 29 '21 at 09:03
  • I put two reports of analysis of heap dump (google drive files), do you see something here ? – user2178964 Apr 29 '21 at 12:21
  • Ah yes, the reports that you uploaded to Google Drive show the complete Stacktraces, and suggest you run your application on JBoss/Tomcat (or similar), this means you may find out more about your issue from any http-access-logfiles you might have, that may help to tell more about the URL, source IP-address, time, HTTP, if you start looking for slow requests with an unsuccesful status code (5xx). See: https://tomcat.apache.org/tomcat-7.0-doc/api/org/apache/catalina/valves/AccessLogValve.html But no, I did not find a complete solution because it does not provide access to variables/references – JohannesB Apr 29 '21 at 13:08
  • Very good catch ! But I already check this thing. I have the complete access log. And even when it happens with not many users, it's very difficult to catch the problem. For example, if the user launch his action 10min before the OOM, the code continue to run background, saturate the memory....etc, and I'm not gonna have an error in access log. I'm trying to add some log automatically with aspectJ right now, maybe it will be enough.. – user2178964 Apr 29 '21 at 13:26
  • It seems your investigation turns away from the Heap Dump. If you cannot find what you are looking for in the HttpServletRequest data in the Heap Dump you could also try to log the entire requests/responses with e.g.: `tcpdump` or a proxy-server like: http://httpd.apache.org/docs/2.4/mod/mod_dumpio.html which might also help to find any related earlier requests (with the same cookie, TLS SessionID, source IP etc.) – JohannesB Apr 29 '21 at 13:45
1

I finally solved my OOM problem with the following steps :

1\ Create thread dump every 10 seconds, and keep the last 4 hours of dumps. I can't store full heap dump because I'm limited in space disk, so with this method, I only store a lot of "small" files. And I can follow the activity minutes before the OOM and after.

I modify my crontab to add :

#Check threads for OOM
* * * * * ( pidof java | xargs /opt/jdk/bin/jstack > /tmp/jstack`date "+\%d-\%m-\%Y:\%H:\%M:\%S"`.log )  
* * * * * ( sleep 10 ; pidof java | xargs /opt/jdk/bin/jstack > /tmp/jstack`date "+\%d-\%m-\%Y:\%H:\%M:\%S"`.log )  
* * * * * ( sleep 20 ; pidof java | xargs /opt/jdk/bin/jstack > /tmp/jstack`date "+\%d-\%m-\%Y:\%H:\%M:\%S"`.log )  
* * * * * ( sleep 30 ; pidof java | xargs /opt/jdk/bin/jstack > /tmp/jstack`date "+\%d-\%m-\%Y:\%H:\%M:\%S"`.log )  
* * * * * ( sleep 40 ; pidof java | xargs /opt/jdk/bin/jstack > /tmp/jstack`date "+\%d-\%m-\%Y:\%H:\%M:\%S"`.log )  
* * * * * ( sleep 50 ; pidof java | xargs /opt/jdk/bin/jstack > /tmp/jstack`date "+\%d-\%m-\%Y:\%H:\%M:\%S"`.log )  
0 * * * * find /tmp -name "jstack*" -type f -mmin +240 -delete

2\

When an OOM happens. I download the thread dump files around 20 minutes before and 10 seconds after the time of OOM.

3\

I open all the files, and search for a class linked to my own app : "com.al6.xxxxx"

I see that there is a thread wich begins like 10 minutes before OOM, and base on his ajp-xxx number, I found it in every file until the OOM :

"ajp-0.0.0.0-8009-23" #7088 daemon prio-5 os-prio-0 tid-0x00007f7ffle1b000 nid-0x673 runnable [0x000000004cfa8000]
java. lang. Thread. State: RUNNABLE
at java.util.Arrays.copy0f(Arrays. java:3181)
at java.util.ArrayList.grow(ArrayList. java: 265)
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:239)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:231)
at java.util.ArrayList.add(ArrayList. java:462)
at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.append(ServerSerializationStreamWriter.java:583)
at com.google.gwt.user.client.rpc.impl.AbstractSerializationStreamWriter.writeInt(AbstractSerializationStreamWriter.java:100)
at com.google.gwt.user.client.rpc.impl.AbstractSerializationStreamWriter.writeString(AbstractSerializationStreamWriter.java:134)
at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter$ValueWriter$10.write(ServerSerializationStreamWriter.java:165)
at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serializeValue(ServerSerializationStreamWriter.java:536) at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serializeClass(ServerSerializationStreamWriter.java:709)
at com.google.gwt.user.server.rpc.impl.ServerSerializationStreamWriter.serializeImpl(ServerSerializationStreamWriter.java:748)
at com.google.gwt.user .server. rpc. impl.ServerSerializationStreanWriter.serialize(ServerSerializationStreamWriter.java:621)
at com.google.gwt.user.client.rpc.impl.AbstractSerializationStreamWriter.writeObjectAbstractSerializationStreamWriter.java:126)

So, it's the guilty thread, clearly ! But I only have always the same stack tracelinked to GWT, without the launcher/parent of this running code.

So searching with the name "ajp-0.0.0.0-8009-23", I finally trace back the thread through his creation, and found a portion of "server code" !

com.al6.serviceBaseEntry.getXXXXX 

This code return an ArrayList with a lot of elements in it, sometimes 100, sometimes 1500. It's not a lot, but for each element of the list, there is like 30 attributes. This list of element is sent to GWT to generate a drop-down list with autocompletion.

In theory, the user had to type a sufficiently restrictive filter to return 10-20 elements. But sometimes, the user type a very generic filter, and so the method server side return 1500 elements. GWT tries to serialize the 1500 elements with all their attributes, and boum, OOM.

I just modify the IHM side code to force the user to enter an enough restrictive filter, and made a check in server side, limiting the result number to 100.

Since, no more OOM, it's solved !

In complement, for those who can, think about update GWT, even a little, there is some changes which add details on stack trace. And in my case of OOM, I think with a newer version of GWT, I would have more details on the cause of the OOM in the final stack trace, without doing all this search process.

user2178964
  • 124
  • 6
  • 16
  • 40