3

We have an Ada shared library compiled by GnatPro 19.2 that we're calling through a JNA call.

Our application runs fine under windows. When porting it under Linux, the application crashes randomly with an Ada exception:

storage error or erroneous memory access.

Debugging with gdb (attaching the process) doesn't help much. We get various SIGSEGV, we continue, and after a while we get the storage error with no useable call stack.

Our shared library can be used with a python native call without any issues whatsoever. The issue is probably on the Java side.

Tried switching JVM (openjdk or official jdk) without luck.

Why is this? Is there a way to workaround it?

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219

1 Answers1

7

The first hint is getting a bunch of SIGSEGV when trying to attach a debugger to the application, then seeing the program resuming when continuing.

It means that the SIGSEGV signal is handled on the Java side, as confirmed in Why does java app crash in gdb but runs normally in real life?.

Java uses speculative loads. If a pointer points to addressable memory, the load succeeds. Rarely the pointer does not point to addressable memory, and the attempted load generates SIGSEGV ... which java runtime intercepts, makes the memory addressable again, and restarts the load instruction.

Now what happens, is that by default, the GNAT run-time installs a new signal handler to catch SIGSEGV and redirect to a clean Ada exception. One interesting feature of Ada exceptions is that they can print the stack trace, even without a debugger. This SIGSEGV handler redirection allows this.

But in the case of Java, since Java uses speculative loads, SIGSEGV are expected from time to time on the java side. So when the Ada shared library has been loaded & initialized, the Ada SIGSEGV handler is installed, and catches those "normal" SIGSEGV, and aborts immediately.

Note that it doesn't happen under Windows. The java runtime probably cannot use this speculative load mechanism because of Windows limitations when handling memory violation accesses.

The signal handling is done in s-intman.adb

 --  Check that treatment of exception propagation here is consistent with
  --  treatment of the abort signal in System.Task_Primitives.Operations.

  case signo is
     when SIGFPE  => raise Constraint_Error;
     when SIGILL  => raise Program_Error;
  --   when SIGSEGV => raise Storage_Error;  -- commenting this line should fix it
     when SIGBUS  => raise Storage_Error;
     when others  => null;
  end case;
end Notify_Exception;

Now we'd have to rebuild a new native run-time and use it instead of the default one. That is pretty tedious and error prone. That file is part of gnarl library. We'd have to rebuild the gnarl library dynamically with the proper options -gnatp -nostdinc -O2 -fPIC to create a gnatrl library substitution... and do that again when upgrading the compiler...

Fortunately, an alternate solution was provided by AdaCore:

First create a pragmas file in the .gpr project directory (let's call it no_sigsegv.adc) containing:

pragma Interrupt_State (SIGSEGV, SYSTEM); 

to instruct the run-time not to install the SIGSEGV handler

Then add this to the Compiler package of the .gpr file:

  package Compiler is
    ...
      for local_configuration_pragmas use Project'Project_dir & "/no_sigsegv.adc";

and rebuild everything from scratch. Testing: not a single crash whatsoever.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • I was looking for a real use case for pragma config files like 3 days ago :) If I understand correctlly: both [SIGSEGV while debugging] and [initial storage error problems] are gone, aren't they ? – LoneWanderer Jul 01 '20 at 08:57
  • 1
    SIGSEGV while debugging is still there. It's a Java issue, not an Ada issue. But we don't need to debug the application since it works :) And if it fails we can debug it from the windows side, where java doesn't use this mechanism – Jean-François Fabre Jul 01 '20 at 08:58
  • @LoneWanderer the workaround from the linked answer works: `(gdb) handle SIGSEGV nostop noprint pass`. That way you can debug your app without being interrupted by spurious SIGSEGV. Most of the time Ada programs don't trigger SIGSEGV. That is, if you kept ada runtime checks enabled. – Jean-François Fabre Jul 01 '20 at 17:38