I've been running a Spark Application and one of the Stages failed with a FetchFailedException. At roughly the same time a log similar to the following appeared in the resource manager logs.
<data> <time>,988 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAudtiLogger: User=<user> OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=<appid> CONTAINERID=<containerid>
My application was using more than yarn allocated it however it had been running for several days. What I expect happened is that other applications started and wanted to use the cluster and the Resource Manager killed one of my containers to give the resources to the others.
Can anyone help me verify my assumption and/or point me to the documentation that describes the log messages that the Resource Manager outputs?
Edit: If it helps the Yarn version I'm running is 2.6.0-cdh5.4.9