3

After a very long hunt and for a related bug, I came to this strange behavior:

If on Linux I run a single JNI method to do a select:

JNIEXPORT void JNICALL Java_SelectJNI_select(JNIEnv *env, jobject thisObj) {
  // Print the curerent PID
  fprintf(stderr, "PID: %d\n", getpid());

  // Wait for 30 seconds
  struct timeval *timeout = (struct timeval *) calloc(1, sizeof(struct timeval));
  timeout->tv_sec = 30;
  timeout->tv_usec = 0;
  select(0, NULL, NULL, NULL, timeout);

  return;
}

and then I run the executable with strace, the select is not executed with the PID I have printed, but with the PID of a child, with the original object actually waiting on a mutex (this doesn't happen if I execute the same call in a plain small C program).

Say strace -f -o strace_output.txt java SelectJNI prints:

PID: 46811 

then grep select\( strace_output.txt will return:

46812 select(0, NULL, NULL, NULL, {tv_sec=30, tv_usec=0} <unfinished ...>

My guess is that JNI is forking and, in some way replacing the original select with its own wrapped version, probably to remain responsive.

I have a lot of questions, but the ones I care more about are:

  1. Is my hypothesis correct? JNI replacing functions under my feet?
  2. Is this behavior documented somewhere?
  3. The process where the actual select is invoked seems always to be that of the first child. Can I rely on that? If not, how do I find out where select is actually running?
Rick77
  • 3,121
  • 25
  • 43
  • Have you confirmed that the parent isn't *immediately* forking to set up the many threads expected in a JVM? – nanofarad Feb 09 '21 at 15:34
  • I may not have understood your remark, but I don't _think_ it's the case: if the parent had forked already, the pid reported by the printf and the one in strace would be identical. – Rick77 Feb 09 '21 at 15:35
  • My surprise comes from the fact that the process seem to either fork or delegate the select call _after_ the fprintf, that is when select is called. Again, maybe I didn't get your remark, though. – Rick77 Feb 09 '21 at 15:36
  • 1
    46812 in the strace output is a TID, not a PID, if I recall correctly. Print the result of `gettid` instead. – nanofarad Feb 09 '21 at 15:36
  • it would make sense, thank you @nanofarad! – Rick77 Feb 09 '21 at 15:37
  • 1
    @Rick77 off-topic, but instead of `grep select\(` you can do `strace -f -e select ...` – rkosegi Feb 09 '21 at 15:50
  • thank you @rkosegi: It will be useful! – Rick77 Feb 09 '21 at 16:07

2 Answers2

3

The JVM may indeed fork, but it does so to create new JVM threads, rather than whole processes. While 46811 is the PID, the thread that's actually running your code in question has TID 46812 (which is what strace prints), while still running under PID 46811. Replacing getpid with gettid in the sample should lead to a consistent output.

nanofarad
  • 40,330
  • 4
  • 86
  • 117
  • Thank you @nanofarad, I can confirm after printing the ``gettid()`` result that this is the case. I'm a bit surprised, though, that (in a more complex example) I was able to send a SIGINT to the process by using the TID and not the PID: does "kill" play well with TIDs too? – Rick77 Feb 09 '21 at 15:40
  • 2
    @Rick77 A signal can be routed to a *process* by using any TID within the process ([ref](https://stackoverflow.com/questions/22005719/which-thread-handles-the-signal)), but which thread *handles* the signal is depends on whether the signal is per-process, per-thread, etc (see linked ref) – nanofarad Feb 09 '21 at 15:41
3

I want to elaborate on the accepted answer by @nanofarad and address the 3 points of my own question explicitly.

My guess is that JNI is forking and, in some way replacing the original select with its own wrapped version, probably to remain responsive. [...]

  1. Is my hypothesis correct? JNI replacing functions under my feet?

No, it is not.

The select executed by JNI has nothing special to it.

The hypothesis that JNI was replacing it with "something that forks the process" was wrong: I just misinterpreted the TID printed by strace for a PID.

JNI just executes the strace in the Java thread.

  1. Is this behavior documented somewhere?

No need to: since the JNI call is executed in the calling Java thread there is nothing to write on the matter.

  1. The process where the actual select is invoked seems always to be that of the first child (et cetera...)

It's the TID of the first spawned thread that appears to be always equal to PID + 1, but i's a likely behavior (the Java thread is created right after the runtime is started), it is not bound to be.

Rick77
  • 3,121
  • 25
  • 43