2

We have apps that are deployed in production [Java / Scala]. We have alerts setup when ever there is a spike in CPU usage or memory usage.

Occasionally we see a huge spike in CPU or memory usage. Some times the application that is running on play stops responding to request.

I usually see the logs for last few API hits before the crash, that way I recently figured out one of API was downloading huge dump of data and memory got exhausted.

Can I get tips for troubleshooting the issues in general [commands / tools to capture stats] when things go wrong in prod?

Sunil Rajashekar
  • 350
  • 2
  • 18
  • 1
    For testing, you can use jvisualvm to hook into the live process and watch it live. It provides memory and cpu sampling and ships with the JDK. – kutschkem Feb 28 '18 at 15:56
  • 1
    [This book](https://www.rockvalleycollege.edu/webadmin/upload/Top-10-Java-Performance-Problems.pdf) is a perfect starting point – senape Feb 28 '18 at 16:00
  • 1
    You may want to do some research into java profiling tools. The answers here run the risk of turning into an opinion-based fight for what everyone's favorite profiler is. That said, Vinay's answer is good and universally applicable. – Brandon McKenzie Feb 28 '18 at 16:19

1 Answers1

5

This requires a lot of experience though. Below are some steps that you could follow:

Prerequisite:

  1. You should understand java Memory Model i.e. what's New Generation(Eden, Survivor-01,Survivor-02), Old Generation, Meta Space, Heap, Stack etc.

    Read this to understand it better.

  2. You should understand how Garbage collection works. e.g. you should understand how Mark and Sweep algorithm works. Check the same link as above for same.

Now you could install visual VM. Also, in visual vm install a plugin visual gc it will show you memory used in different space. You will see another tab Visual GC

enter image description here

i) Observe  Graphs(Heap one to top right in the snapshot below)  in Monitor Tab. 

enter image description here

**Trick: ** You could perform manual GC as well to observe how steep the graph line for Used Heap Space is and how quickly it fills up at running some block of code. I used it many times and it really helps (Especially if used with the debugger)!

ii) Also, try to observe the Thread Dump if multithreading is causing some issue.

iii) In any case, you could also do some profiling or sampling via profiler and sampler tab.

Below is a snapshot of sampler. See how clearly it tells how much memory is taken by what data type:

enter image description here

Important: Screenshot is of the heap. You could change to Per Thread Allocation tab to see per thread allocation. Similarly, you could observe CPU consumption.

  1. Alternatively, use JMeter if you think locally you are not able to reproduce the same. Jmeter can help you extensively load test your application basically.

  2. Also, if you have integrated any server monitoring tool that could also be helpful. You could easily get notified for a problematic code.

  3. At last, you could download the heap dump from the production system and analyze that on local using visual vm.

    jmap -J-d64 -dump:format=b,file=<heap_dump_filename> <pid>

    this link have more detailed answers from some really cool developers on same.

  4. Use jstat. It comes with java and is very handy sometimes.

    jstat -gc 2341 //2341 is the java process id.

These are from my experience. But In this direction, there would never be enough and I believe my knowledge keeps on evolving as I face more such issues. Hence, please practice it and explore further.

Having said that, there are other tools available so also feel free to find other ones that suit your needs well. To get started take a look at Jconsole.

Vinay Prajapati
  • 7,199
  • 9
  • 45
  • 86