1

I have a small java application running a set of computational heavy tasks. For processing the tasks, I use an external library which does most of the computation via native methods and some C code. Unfortunately, after solving one task, the library suffers from heavy memory leaks and can therefore only solve one task per application execution.

The memory problem is known to the coders from the library, but not fixed yet and maybe never will (it has something to do with the java garbage collector not properly working with the native inferface). Since there is no alternative for this particular library, I am looking for options to solve the tasks by sequentially application executions.

Currently, I have a bash wrapper script, which gets a list of tasks that should be executed and for each task the script calls the application with just this single task to execute.

Since tasks often need the results from previous tasks, this involves serializing and deserializing execution results to files. This does not seem to be good practice to me, also because the user has basically no way to interact with the program control flow.

Does anybody have an idea how I can to this sequential task execution inside one single java application? I guess this would involve starting a new JVM for each task exection, hopefully only transferring the task result and not the memory leaks from the new JVM to my application.

Edit providing further information:

  • Changing the root of the problem: Unfortunately, the library is not open source and I have neither access to the native methods nor to the java interface api.

  • New processes / JVMs: Is that the same in this context? I have not much experience with the java process api or starting new JVMs. My assumption is that this would involve starting a separate java program with its own main function using ProcessBuilder.start()?

  • Exchange of data: It is only a couple of kilobytes so performance is not an issue. Still, a solution without files would be preferable, but if I understand correctly memory mapped files also use local files. Sockets on the other hand do sound promising.

  • Shame the JVM doesn't have the equivalent of an AppDomain (Isolate what?) .. anyway the approach mentioned, of starting a separate process/JVM and using IPC to transfer the data, sounds like a "suitable" hack; barring actually using a library that is implemented half-correctly. (But maybe there are some methods that can be called *manually* to release the underlying resources? These should be documented if they exist.) – user2864740 Sep 26 '14 at 08:55
  • Can you not call directly into the supporting C library underneath the JNI? Then you have finer control over the memory and cut out the garbage collector. – Bathsheba Sep 26 '14 at 09:22

2 Answers2

6

Funnily enough, I've faced the same issue. By definition, you need to accept nothing will be best practice or nice faced with having to use a faulty library you must use but cannot upgrade.

The solution we came up with was to isolate calls to the library in it's own process. This process was a child of a master process. The master process contains the good code and the child the bad. We were then able to keep track of the number of invocations of the child process and tear it down once it reached a certain number. We knew that we could get away with X invocations before the child process was corrupt.

Because of the nature of our problem, bringing up a fresh process enabled us to have another X invocations before repeating.

Any state was returned to the master process on a successful invocation. Any state gathered during an unsuccessful invocation was discarded and we started again.

Again, none of the above is "nice" but it worked for us.

For what it's worth, if I did this again, I'd use Akka and remote actors which would make all the sub-process, remoting etc far simpler.

imrichardcole
  • 4,633
  • 3
  • 23
  • 45
1

That depends. Do you have the source code of this external application, i.e. can you recompile it? The easiest approach is obviously to fix the leak at its root. This might however be impractical. If the library, as you say, is implemented via native methods and some C code, I do not think that the problem has something to do with the Java garbage collector not properly working. Native methods and C code do not normally store their data on the JVM's heap and are therefore not garbage collected, i.e. it is the job of the library to clean up after itself.

If the leak is indeed in the bit of Java code that the library exposes, than there is a way. Memory leaks in Java occure by forgetting about references, e.g. consider the following example:

class Foo {

  private ExpensiveObject eo; 

  Foo(ExpensiveObject eo) {
    this.eo = eo;
  }
}

The ExpensiveObject is alive (at least) as long as its referencing Foo instance. If you (or your library) do(es) not isolate instance life-cycles well enough, you get into trouble. If you do not have a chance to refactor, you can however use reflection to clean up the biggest mess from another place in your code:

void release(Foo foo) {
  Field f = Foo.class.getDeclaredField("eo");
  f.setAccessible(true);
  f.set(foo, null);
}

This should however be considered a last-resort as it is quite a hack.

Alternatively, a better approach is normally to fork another instance of a JVM to do the dirty work. It seems like you are doing something similar already. By forking a JVM, you isolate the use of memory on a process level. Once the process dies, all memory is released by the OS. The problem with this approach is normally platform compatibility but as you already use a native library, this does not worsen your situation.

You say that you currently use files to communicate between these different processes. Why do you need to store data in a file? Rather consider using sockets or memory-mapped files (NIO), if performance is important for this matter.

Community
  • 1
  • 1
Rafael Winterhalter
  • 42,759
  • 13
  • 108
  • 192