1

I'm trying to access a native (fortran) library (mylib.so) loaded by JNA. The library is accessed in parallal by a Spark-Job. So far I have not synchronized the method calls (nor the library) as the call to the shared library is the bottleneck in my computations and they must run in parallel.

I get the following Error:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ffbcb5f8dcd, pid=58569, tid=140708155152128
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
*** Error in `/usr/java/jdk1.8.0_60/jre/bin/java': double free or corruption (!prev): 0x0000000001b756d0 ***
*** Error in `/usr/java/jdk1.8.0_60/jre/bin/java': free(): corrupted unsorted chunks: 0x0000000001b75010 ***
*** Error in `/usr/java/jdk1.8.0_60/jre/bin/java': double free or corruption (!prev): 0x0000000001b756d0 ***
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libc.so.6+0x38dcd]======= Backtrace: =========
======= Backtrace: =========
======= Backtrace: =========
/lib64/libc.so.6(+0x7d053)[0x7ffbcb63d053]
/lib64/libc.so.6(+0x7d053)[0x7ffbcb63d053]
/lib64/libc.so.6(+0x7d053)[0x7ffbcb63d053]
/lib64/libc.so.6(+0x38e90)[0x7ffbcb5f8e90]
/usr/java/jdk1.8.0_60/jre/lib/amd64/server/libjvm.so(+0x5d43f9)[0x7ffbcabba3f9]
/lib64/libc.so.6(+0x38e90)[0x7ffbcb5f8e90]
/lib64/libc.so.6(+0x38eb5)[0x7ffbcb5f8eb5]
/lib64/libc.so.6(+0x38e69)[0x7ffbcb5f8e69]
/lib64/libc.so.6(+0x38eb5)[0x7ffbcb5f8eb5]
  __run_exit_handlers+0x3d
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
/opt/usr/local/lib/mylib.so(for_exit+0x19)[0x7ff9285b0fa9]
/lib64/libc.so.6(+0x38eb5)[0x7ffbcb5f8eb5]
/opt/usr/local/lib/mylib.so(for_exit+0x19)[0x7ff9285b0fa9]
/opt/usr/local/lib/mylib.so(for__once_private+0x27)[0x7ff9285660b7]
/opt/usr/local/lib/mylib.so(for__once_private+0x27)[0x7ff9285660b7]
/opt/usr/local/lib/mylib.so(for__acquire_lun+0x814)[0x7ff92855ea04]
/opt/usr/local/lib/mylib.so(for__acquire_lun+0x814)[0x7ff92855ea04]
/opt/usr/local/lib/mylib.so(for_write_int_fmt+0x9c)[0x7ff92857f83c]
/opt/usr/local/lib/mylib.so(for__acquire_lun+0x814)[0x7ff92855ea04]
/opt/usr/local/lib/mylib.so(for_write_int_fmt+0x9c)[0x7ff92857f83c]
/opt/usr/local/lib/mylib.so(seterr_+0x1c8)[0x7ff928515328]
/opt/usr/local/lib/mylib.so(for_write_int_fmt+0x9c)[0x7ff92857f83c]
/opt/usr/local/lib/mylib.so(seterr_+0x1c8)[0x7ff928515328]
/opt/usr/local/lib/mylib.so(ddl2sf_+0x322)[0x7ff92850eb02]
/opt/usr/local/lib/mylib.so(seterr_+0x1c8)[0x7ff928515328]
/opt/usr/local/lib/mylib.so(entsrc_+0x80)[0x7ff92850edb0]
/opt/usr/local/lib/mylib.so(bspline_+0x52c)[0x7ff92850d66c]
/opt/usr/local/lib/mylib.so(ddl2sf_+0x2e7)[0x7ff92850eac7]
/opt/usr/local/lib/mylib.so(enter_+0x55)[0x7ff92850ecd5]
/opt/usr/local/lib/mylib.so(rspbsp_+0x4a)[0x7ff928523cda]
/opt/usr/local/lib/mylib.so(bspline_+0x52c)[0x7ff92850d66c]
/opt/usr/local/lib/mylib.so(ddl2sf_+0x16d)[0x7ff92850e94d]
/tmp/jna--1845237309/jna4302334124297214663.tmp(ffi_call_unix64+0x4c)[0x7ff92909465c]
/tmp/jna--1845237309/jna4302334124297214663.tmp(ffi_call+0x1d4)[0x7ff929094164]
/opt/usr/local/lib/mylib.so(bspline_+0x52c)[0x7ff92850d66c]
/opt/usr/local/lib/mylib.so(rspbsp_+0x4a)[0x7ff928523cda]
/tmp/jna--1845237309/jna4302334124297214663.tmp(+0x5870)[0x7ff929087870]
/tmp/jna--1845237309/jna4302334124297214663.tmp(Java_com_sun_jna_Native_invokeVoid+0x22)[0x7ff92908a462]
[0x7ffbb5015994]

AFAIK this has to that the native library is not thread-safe? From the stacktracke it seems to me that the actual problem is libc.so, or is it my own library mylib.so ?

Case that the problem is in my own library, would it be possible to overcome the problem by making multiple physical copies of the shared object, one for each thread for example?

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
  • You _can_ make multiple copies of your library; each needs to have a unique image on disk (and thus a unique name). That'd be worth trying with just two threads. – technomage Jul 26 '16 at 11:25
  • while this is true, i wouldn't generally consider it a viable approach -- it doesn't really scale and is quite a hack. – Martin Serrano Jul 26 '16 at 19:21
  • @technomage I'm really confused now. I made multiple copies of my library with unique names, both loaded them using JNA in different variables. The image IS loaded twice, but they still seem to share the memory. I.e. of I modify one global variable in one lib, it also takes this value in the second lib (with global, I mean they are in a common-block). Is that to be expected? – Raphael Roth Jul 27 '16 at 10:00
  • You may need to send some explicit library load flags. [`dlopen`](http://linux.die.net/man/3/dlopen) can take a number of flags; JNA uses `RTLD_LAZY|RTLD_GLOBAL` by default on most unix-like systems. You can pass in an explicit value via `Library.OPTION_OPEN_FLAGS` (an int value you'll have to look up on your system), you may want `RTLD_LOCAL` in this case. – technomage Jul 27 '16 at 21:43

1 Answers1

0

This would generally indicate an issue in your shared library or its use. If you did not design the library to be thread-safe then it is likely not. A single process cannot load a native library multiple times, so there is no easy solution in that way. There are a couple of avenues:

  • Allocate multiple objects from your native library and different ones for each java thread. This may not be possible if your native library internally is using static data structures.
  • Synchronize access to the native library via the classes/methods that expose it. In that way only a single java thread could access them at a time. However, depending on your library and use case this may not help performance -- all the action could still be in one place.
  • Make the library thread-safe. This can be a lot more difficult in native languages than Java. Additionally some algorithms are not well suited to parallelization.
Martin Serrano
  • 3,727
  • 1
  • 35
  • 48
  • I'm already doing the first point. The second point does of course lead to locks, so the threads are not working in parallel anymore... – Raphael Roth Jul 26 '16 at 10:02
  • JNA provides a `Native.synchronizedLibrary()` wrapper which effectively implements #2. – technomage Jul 26 '16 at 11:23
  • if you are creating multiple objects from your shared library (rather than using the same one from multiple threads) then you really don't have a solution that is parallelizable. If this is the case, I'd look at the fortran code to see if the code using static shared data can be refactored. if this is not possible, then you are stuck with the multiple copies approach suggested by technomage – Martin Serrano Jul 26 '16 at 19:24
  • @MartinSerrano I'm really confused now. I made multiple copies of my library with unique names, both loaded them using JNA in different variables. The image IS loaded twice, but they still seem to share the memory. I.e. of I modify one global variable in one lib, it also takes this value in the second lib (with global, I mean they are in a common-block). Is that to be expected? – Raphael Roth Jul 27 '16 at 09:59
  • @RaphaelRoth, yes that would be confusing. I think you are running into the issue described here: http://stackoverflow.com/questions/6538501/linking-two-shared-libraries-with-some-of-the-same-symbols – Martin Serrano Jul 27 '16 at 15:35
  • @MartinSerrano ok, I think I open a new question for this with a running code sample – Raphael Roth Jul 27 '16 at 18:46
  • http://stackoverflow.com/questions/38621126/load-native-fortran-libraries-without-common-memory – Raphael Roth Jul 27 '16 at 19:00