I am running a java process on amazon ec2. It ran for 72 mins and then suddenly I get "java result 137". That is all, there are no exceptions or any other error messages. I have searched for this error but couldn't find anything useful. What could be the cause of it and how to resolve it? Please let me know.
-
What process are you running? Maybe that is just the state that the processing was terminated with; it may not be an error. – Kevin Mangold Aug 03 '12 at 19:05
-
1I believe the number after "Java result" is the value passed to System.exit(int) when the code terminates execution. The convention normally is that any exit code other than zero indicates an error, but it's poor form that there are no error messages to help you debug the situation. – Bobulous Aug 03 '12 at 19:08
-
@KevinMangold I am inserting records into MongoDB on a cluster. There are 4 shards on 4 machines and this java process inserts documents by connecting with MongoS (one which routs the documents to the shards). In my code, I use System.exit() but it explicitly returns -1 when the error condition is met. Thank you for tagging amazon-ec2 as well :), I should have done it while posting. – Raghava Aug 03 '12 at 19:21
-
@KevinMangold forgot to mention, that it sure is an abrupt termination because the process hasn't completed inserting all the required records. – Raghava Aug 03 '12 at 19:29
-
Are there any errors in the log files of the MongoDB instances (mongos and mongod's) at the time that you got the "java result 137"? If so, could you please paste those log excerpts in http://pastie.org/ or http://pastebin.com/, and put the links here, and we can take a look. – Ian Daniel Aug 06 '12 at 01:09
-
Possible duplicate of [Why does my Perl script exit with 137?](https://stackoverflow.com/questions/1041182/why-does-my-perl-script-exit-with-137) – kenorb Apr 03 '19 at 16:48
2 Answers
Exit codes above 127 typically mean the process was stopped because of a Signal.
The exit code 137 then resolves to 128 + 9, whereas Signal 9 is SIGKILL, i.e. the process was forcefully killed. This can among others be a "kill -9 " command. However in your case this could be an out of memory condition on the operating system, which causes a functionality called "OOM Killer" to stop the process which is using up most of the memory in order to keep the OS itself stable even in such a condition.
See this question for a similar discussion.
Just in case someone is interested in knowing where this 128 number comes from; the reason can be found in the OpenJDK source code. See: UNIXProcess_md.c
From the comments in the Java_java_lang_UNIXProcess_waitForProcessExit method:
The best value to return is 0x80 + signal number, because that is what all Unix shells do, and because it allows callers to distinguish between process exit and process death by signal.
So, that is the reason why the JVM developers decided to add 128 to the child's return status when the child exits because of a signal.
I leave here the method in charge of returning status from child process:
/* Block until a child process exits and return its exit code.
Note, can only be called once for any given pid. */
JNIEXPORT jint JNICALL
Java_java_lang_UNIXProcess_waitForProcessExit(JNIEnv* env,
jobject junk,
jint pid)
{
/* We used to use waitid() on Solaris, waitpid() on Linux, but
* waitpid() is more standard, so use it on all POSIX platforms. */
int status;
/* Wait for the child process to exit. This returns immediately if
the child has already exited. */
while (waitpid(pid, &status, 0) < 0) {
switch (errno) {
case ECHILD: return 0;
case EINTR: break;
default: return -1;
}
}
if (WIFEXITED(status)) {
/*
* The child exited normally; get its exit code.
*/
return WEXITSTATUS(status);
} else if (WIFSIGNALED(status)) {
/* The child exited because of a signal.
* The best value to return is 0x80 + signal number,
* because that is what all Unix shells do, and because
* it allows callers to distinguish between process exit and
* process death by signal.
* Unfortunately, the historical behavior on Solaris is to return
* the signal number, and we preserve this for compatibility. */
#ifdef __solaris__
return WTERMSIG(status);
#else
return 0x80 + WTERMSIG(status);
#endif
} else {
/*
* Unknown exit code; pass it through.
*/
return status;
}
}

- 2,102
- 17
- 16