5

I have a java web application deployed under tomcat, and suddenly the response time of a API slowed down as snapshot shows (sorry, I cannot post images as lack of reputations.). It would get back to normal after restarted tomcat.

The only suspect code got my attention was the usage of putIfAbsent of ConcurrentHashMap, so I wrote a test JSP page as following:

ConcurrentHashMap<Integer, Integer> testMap2 = new ConcurrentHashMap<Integer, Integer>();
long putStart2 = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
    testMap2.put(i, data[i]);
}
out.println("concurrent put -> " + (System.currentTimeMillis() - putStart2));
out.println("</br>");

ConcurrentHashMap<Integer, Integer> testMap = new ConcurrentHashMap<Integer, Integer>();
long putStart = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
    testMap.putIfAbsent(i, data[i]);
}
out.println("concurrent putIfAbsent -> " + (System.currentTimeMillis() - putStart));
out.println("</br>");

and got result as:

concurrent put -> 36 
concurrent putIfAbsent -> 157

The JDK version is 1.7.0_45, and I checked the source code of ConcurrentHashMap, putIfAbsent should be faster than or at least the same as put.

I thought the difference might be caused by JIT, so I added JVM options to log compilation:

log for ConcurrentHashMap.putIfAbsent

<task_queued compile_id='211' method='java/util/concurrent/ConcurrentHashMap putIfAbsent (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='256' iicount='256' level='3' stamp='0.604' comment='tiered' hot_count='256'/>
<nmethod compile_id='211' compiler='C1' level='3' entry='0x00007feaf12d1480' size='2208' address='0x00007feaf12d12d0' relocation_offset='288' insts_offset='432' stub_offset='1552' scopes_data_offset='1792' scopes_pcs_offset='2008' dependencies_offset='2184' nul_chk_table_offset='2192' oops_offset='1768' method='java/util/concurrent/ConcurrentHashMap putIfAbsent (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='262' iicount='262' stamp='0.606'/>
<task_queued compile_id='2449' method='java/util/concurrent/ConcurrentHashMap putIfAbsent (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='3458' iicount='3458' level='3' stamp='8.227' comment='tiered' hot_count='3458'/>
<nmethod compile_id='2449' compiler='C1' level='3' entry='0x00007feaf19cfe00' size='2200' address='0x00007feaf19cfc50' relocation_offset='288' insts_offset='432' stub_offset='1552' scopes_data_offset='1784' scopes_pcs_offset='2000' dependencies_offset='2176' nul_chk_table_offset='2184' oops_offset='1768' method='java/util/concurrent/ConcurrentHashMap putIfAbsent (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='3458' iicount='3458' stamp='8.241'/>
<task_queued compile_id='3842' method='java/util/concurrent/ConcurrentHashMap putIfAbsent (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='5378' iicount='5378' stamp='15.601' comment='tiered' hot_count='5378'/>
<nmethod compile_id='3842' compiler='C2' level='4' entry='0x00007feaf1db8400' size='13248' address='0x00007feaf1db81d0' relocation_offset='288' insts_offset='560' stub_offset='7760' scopes_data_offset='8072' scopes_pcs_offset='12136' dependencies_offset='12920' handler_table_offset='12992' nul_chk_table_offset='13232' oops_offset='7816' method='java/util/concurrent/ConcurrentHashMap putIfAbsent (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='38053' iicount='38053' stamp='15.650'/>
<jvms bci='75' method='java/util/concurrent/ConcurrentHashMap putIfAbsent (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='38063' iicount='38063'/>
<task_queued compile_id='5282' method='java/util/concurrent/ConcurrentHashMap putIfAbsent (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='38146' iicount='38146' decompiles='1' level='3' stamp='23.188' comment='tiered' hot_count='38146'/>
<nmethod compile_id='5282' compiler='C1' level='3' entry='0x00007feaf2032880' size='2200' address='0x00007feaf20326d0' relocation_offset='288' insts_offset='432' stub_offset='1552' scopes_data_offset='1784' scopes_pcs_offset='2000' dependencies_offset='2176' nul_chk_table_offset='2184' oops_offset='1768' method='java/util/concurrent/ConcurrentHashMap putIfAbsent (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='38173' iicount='38173' decompiles='1' stamp='23.200'/>
<task_queued compile_id='5448' method='java/util/concurrent/ConcurrentHashMap putIfAbsent (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='46339' iicount='46339' decompiles='1' stamp='24.862' comment='tiered' hot_count='46338'/>
<nmethod compile_id='5448' compiler='C2' level='4' entry='0x00007feaf2041a00' size='4232' address='0x00007feaf2041850' relocation_offset='288' insts_offset='432' stub_offset='2416' scopes_data_offset='2568' scopes_pcs_offset='3736' dependencies_offset='4088' handler_table_offset='4112' oops_offset='2456' method='java/util/concurrent/ConcurrentHashMap putIfAbsent (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='46350' iicount='46350' decompiles='1' stamp='24.878'/>
<jvms bci='65' method='java/util/concurrent/ConcurrentHashMap putIfAbsent (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='46350' iicount='46350' decompiles='1'/>
<jvms bci='65' method='java/util/concurrent/ConcurrentHashMap putIfAbsent (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='46350' iicount='46350' decompiles='1'/>
<jvms bci='65' method='java/util/concurrent/ConcurrentHashMap putIfAbsent (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='46350' iicount='46350' decompiles='1'/>

log for ConcurrentHashMap.put

<task_queued compile_id='2937' method='java/util/concurrent/ConcurrentHashMap put (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='512' iicount='512' level='3' stamp='9.732' comment='tiered' hot_count='512'/>
<nmethod compile_id='2937' compiler='C1' level='3' entry='0x00007feaf18010c0' size='2200' address='0x00007feaf1800f10' relocation_offset='288' insts_offset='432' stub_offset='1552' scopes_data_offset='1784' scopes_pcs_offset='2000' dependencies_offset='2176' nul_chk_table_offset='2184' oops_offset='1768' method='java/util/concurrent/ConcurrentHashMap put (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='514' iicount='514' stamp='9.735'/>
<task_queued compile_id='5235' method='java/util/concurrent/ConcurrentHashMap put (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='10752' iicount='10752' stamp='22.558' comment='tiered' hot_count='10752'/>
<nmethod compile_id='5235' compiler='C2' level='4' entry='0x00007feaf20a4b40' size='4208' address='0x00007feaf20a4990' relocation_offset='288' insts_offset='432' stub_offset='2384' scopes_data_offset='2536' scopes_pcs_offset='3712' dependencies_offset='4064' handler_table_offset='4088' oops_offset='2424' method='java/util/concurrent/ConcurrentHashMap put (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' bytes='79' count='10766' iicount='10766' stamp='22.570'/>

I cannot understand the log entirely, does it mean putIfAbsent is deoptimized? but I added

-XX:-UseCodeCacheFlushing -XX:ReservedCodeCacheSize=80m

Why did deoptimization happen on putIfAbsent?

Ivan
  • 61
  • 2
  • 1
    What about running the test JSP after tomcat restart? Are results the same? How much is `size` parameter? Note that CHM throughput may lower when contention is increased (for example, more threads trying to modify it intensively). – Tagir Valeev Sep 29 '15 at 06:27
  • How many entries are in the map after a day or more? How many have been deleted and updated? Any other possibility of *memory* consumption during 24 hours? - It is not unheard of that a slowdown occurs when GC suddenly has to work much more. – laune Sep 29 '15 at 06:29
  • In real product usage, the map is a temparory variable, once for two threads to merge data, but now it is only used by one thread (yes, legacy code use concurrent map in single thread situation), the size is range from 1000 ~ 10000. In my test page the size is 100000, and accessed by single thread also. It is not gc issue, because gc is normal, but putIfAbsent is always slow after suddenly got slow before restart tomcat. After restart tomcat, test result is concurrent put -> 35 concurrent putIfAbsent -> 35 – Ivan Sep 29 '15 at 07:02
  • no entry was deleted or updated for this temporary map in both product usage or test page, and both accessed by single thread. After restart tomcat the test result is normal, there is no different between response time of put and putIfAbsent, around 35ms. – Ivan Sep 29 '15 at 07:09
  • Did you try switching the order? – matt Sep 29 '15 at 07:18
  • Yes, there is no difference. The test result I posted was after I changed the order of invoking of put and putIfAbsent. I have tried with both put before putIfAbsent and reverse. – Ivan Sep 29 '15 at 07:25
  • If I run the test at 8.3 million elements, putIfAbsent is faster. At 16.7 million elements, put is faster. – matt Sep 29 '15 at 07:32
  • From source code, put and putIfAbsent is almost same. There should not be so big difference between them, and they performed same well after I restarted tomcat. – Ivan Sep 29 '15 at 07:41
  • 1
    Note that `System.currentTimeMillis()` is not precise (i.e. it does not necessarily return exactly the correct time to millisecond precision). Its exact precision is a platform-specific implementation detail. In your benchmark `.put()` took 36ms and `.putIfAbsent()` took 157ms - that's well within the realm of possible clock imprecision. Benchmarking Java is hard and benchmarking *concurrent* Java is even harder, and at a glance this test isn't robust. See http://stackoverflow.com/q/504103/113632 for some suggestions. – dimo414 Nov 24 '15 at 20:58

1 Answers1

0

(I think the -XX:+PrintCompilation would be more readable, although less detailed. This xml log is really only useful in jitwatch)

I guess the c2 has deoptimized because something new showed up, like a new code path not taken before. The for loop in the ConcurrentHashMap.putVal() underneath is complex and thus exposed to more decompilation due to the high number of paths. The jit installs what an uncommon trap in branches less frequent. So for a moment the code is back to interpreted.

Jitwatch should indicate some reasons in the suggestions tab. Lot's of good JIT videos on youtube from Douglas Hawkins, explaining the jit internals. They are session at various conferences, and imho worth every minute of your attention. I can't recall all the rules, and they are not all documented.

Hope this helps, even though it's not an answer really.

user2023577
  • 1,752
  • 1
  • 12
  • 23