I'm running a legacy Spring Boot (1.4.4) application with Thymeleaf (2.1.4) in production on Java 8 (1.8.0_161-b12 / HotSpot 64 25.161-b12). When the application experiences moderate load (1K reqs/min), all app requests experience latency (> 1min), and CPU on the app instances max out.
How can I pinpoint exactly where this performance bottleneck is occurring? If it is a code / design issue how can I be certain, or if I just need to allocate more resources?
Things I've tried:
- Scaling up the DB and increase connection pool size ( CPU is negligible )
- Disabled the template/expression (Thymeleaf) and query cache (Hibernate/couchbase) and experienced same latency.
- Scale up cluster (3x t2.large)
Other things to note:
- JVM heap is not configured, but doesn't appear to approach max heap size.
- JVM monitoring shows regular (Allocation Failed) garbage collection during load w/ ParallelGCFailedAllocation occurring frequently
- There's a few
Full GC
events occurring that take b/w 1-3 seconds. - A heap dump during peak latency reveals over 172 blocked threads that appear to be waiting on Thymeleaf cache PUT requests?!
"http-nio-8080-exec-168" - Thread t@488
java.lang.Thread.State: BLOCKED
at org.thymeleaf.cache.StandardCache$CacheDataContainer.putWithoutTracing(StandardCache.java:384)
- waiting to lock <4e409b1c> (a [Ljava.lang.Object;) owned by "http-nio-8080-exec-170" t@490
at org.thymeleaf.cache.StandardCache$CacheDataContainer.put(StandardCache.java:368)
at org.thymeleaf.cache.StandardCache.put(StandardCache.java:141)
at org.thymeleaf.spring4.expression.SpelVariableExpressionEvaluator.getExpression(SpelVariableExpressionEvaluator.java:184)
at org.thymeleaf.spring4.expression.SpelVariableExpressionEvaluator.evaluate(SpelVariableExpressionEvaluator.java:129)
at org.thymeleaf.standard.expression.VariableExpression.executeVariable(VariableExpression.java:154)
at org.thymeleaf.standard.expression.SimpleExpression.executeSimple(SimpleExpression.java:59)
at org.thymeleaf.standard.expression.Expression.execute(Expression.java:103)
at org.thymeleaf.standard.expression.Expression.execute(Expression.java:133)
at org.thymeleaf.standard.expression.Expression.execute(Expression.java:120)
at org.thymeleaf.standard.processor.attr.AbstractStandardLocalVariableDefinitionAttrProcessor.getNewLocalVariables(AbstractStandardLocalVariableDefinitionAttrProcessor.java:89)
at org.thymeleaf.processor.attr.AbstractLocalVariableDefinitionAttrProcessor.processAttribute(AbstractLocalVariableDefinitionAttrProcessor.java:62)
at org.thymeleaf.processor.attr.AbstractAttrProcessor.doProcess(AbstractAttrProcessor.java:87)
at org.thymeleaf.processor.AbstractProcessor.process(AbstractProcessor.java:212)
at org.thymeleaf.dom.Node.applyNextProcessor(Node.java:1017)
at org.thymeleaf.dom.Node.processNode(Node.java:972)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.Document.process(Document.java:93)
at org.thymeleaf.TemplateEngine.process(TemplateEngine.java:1155)
at org.thymeleaf.TemplateEngine.process(TemplateEngine.java:1060)
at org.thymeleaf.TemplateEngine.process(TemplateEngine.java:1011)
at org.thymeleaf.spring4.view.ThymeleafView.renderFragment(ThymeleafView.java:335)
at org.thymeleaf.spring4.view.ThymeleafView.render(ThymeleafView.java:190)
at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1271)
Since the application has many layers, is there a way to nail down the culprit?
UPDATE:
OK so after digging some more, a few more things:
- Another thread dump during another point in the load testing shows a bunch of blocked threads on hibernate calls to the database.
java.lang.Thread.State: BLOCKED
at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:543)
- waiting to lock <3d1c45eb> (a java.lang.Object) owned by "http-nio-8080-exec-270" t@823
at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)
at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:220)
at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
at sun.security.ssl.CipherBox.createExplicitNonce(CipherBox.java:1015)
at sun.security.ssl.OutputRecord.encrypt(OutputRecord.java:248)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:869)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:857)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
- locked <7721ca63> (a sun.security.ssl.AppOutputStream)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <2135117a> (a java.io.BufferedOutputStream)
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3725)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2506)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2677)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2549)
- locked <5c36dc6f> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861)
- locked <5c36dc6f> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1962)
- locked <5c36dc6f> (a com.mysql.jdbc.JDBC4Connection)
at sun.reflect.GeneratedMethodAccessor254.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.github.gquintana.metrics.proxy.MethodInvocation.proceed(MethodInvocation.java:75)
at com.github.gquintana.metrics.sql.PreparedStatementProxyHandler.execute(PreparedStatementProxyHandler.java:53)
at com.github.gquintana.metrics.sql.AbstractStatementProxyHandler.invoke(AbstractStatementProxyHandler.java:50)
at com.github.gquintana.metrics.proxy.ProxyHandler.invoke(ProxyHandler.java:56)
at com.sun.proxy.$Proxy143.executeQuery(Unknown Source)
at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:79)
at org.hibernate.loader.Loader.getResultSet(Loader.java:2062)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1859)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1838)
at org.hibernate.loader.Loader.doQuery(Loader.java:906)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:348)
at org.hibernate.loader.Loader.doList(Loader.java:2550)
at org.hibernate.loader.Loader.doList(Loader.java:2536)
at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2366)
at org.hibernate.loader.Loader.list(Loader.java:2361)
at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:495)
at org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:357)
at org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:198)
at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1230)
at org.hibernate.internal.QueryImpl.list(QueryImpl.java:101)
at org.hibernate.ejb.QueryImpl.getResultList(QueryImpl.java:268)
at org.hibernate.ejb.criteria.CriteriaQueryCompiler$3.getResultList(CriteriaQueryCompiler.java:254)
at org.broadleafcommerce.common.i18n.dao.TranslationDaoImpl.readTranslation(TranslationDaoImpl.java:262)
at org.broadleafcommerce.common.i18n.service.TranslationServiceImpl.getTranslatedValue(TranslationServiceImpl.java:194)
at org.broadleafcommerce.common.i18n.service.DynamicTranslationProvider.getValue(DynamicTranslationProvider.java:47)
at org.broadleafcommerce.core.catalog.domain.CategoryImpl.getName(CategoryImpl.java:489)
at org.<myorg>.CatalogService.lambda$getSeries1ActiveTechniques$11(CatalogService.java:95)
Normally this call getSeries1ActiveTechniques
is behind a hibernate query cache, but with it disabled, there's a lot more traffic to the SQL database.
While I'm aware this design needs refactored, I'm still unclear as to whether this slowdown is due specifically from load via overall CPU, overall GC impacting CPU, or are these two separate contentions causing GC and high CPU?
- CPU sampling during the test shows:
Hotspots show high CPU for certain calls; jumping out is org.apache.catalina.webresources.JarWarResourceSet.getArchiveEntries
which might be a known issue with embedded tomcat?
But how do I know if these CPU times are actually representative or if these methods are just suffering from greater general slowdown from GC or threadpool limiting at apache/jdbc level?