We have a riemann JVM that currently runs between 50 and 65% that has increased over time gradually thanks to increasing load. Occasionally now though it spikes once is a while to around 90 ~ 95%. The VM is a 48 core 192GB machine. In terms of memory it does not consume more than 80GB.
We were hoping to scale it to the next larger VM of 64 CPU and 256GB memory. Strangely though the moment the BIGGER VM is added, the JVM constantly stays at 100% with a system load close to 100. The JVM process becomes unresponsive. (Bot the VMs are the same M5 types)
We had to scale down to the 48CPU machine.
The main question here is on a smaller VM (48 CPU (both m5 and c5)), the application is stable with CPU consumption at 40-65% and on switching(Scale UP) the VM to a 72 CPU machine, the application becomes unusable with 100% CPU. Switching back to the smaller VM makes everything stable again.
The use case is for a tool named riemann
Additional pointers:
OpenJDK 64-Bit Server VM version 17.0.3.0.1+7-LTS
configuration:
MAX_HEAP=$(awk '/MemTotal/ { printf "%.0f", $2/1024*0.75 }' /proc/meminfo)
HALF_MAX_HEAP=`expr $MAX_HEAP / 2`
exec chpst -u vcap:vcap java \
-XX:+UseParallelGC \
-XX:+ExitOnOutOfMemoryError \
-Xms"$HALF_MAX_HEAP"m \
-Xmx"$MAX_HEAP"m \
-Djava.io.tmpdir="$TMP_DIR" \
-jar /var/vcap/packages/riemann/riemann.jar \
"$CONFIG_DIR"/clojure/riemann.config \
1>>"$LOG_DIR"/"$JOB_NAME".stdout.log \
2>>"$LOG_DIR"/"$JOB_NAME".stderr.log &