5

I'm working to remove all system calls from an existing Java code base. We run our application in a commercially provided, closed-source, JVM. When the JVM makes a system call via a getRuntime.exec() java call the entire JVM process forks which leads to serious performance hits. We run on a linux platform but ideally try to keep things as portable as possible.

I'm running into problems replacing a sync() call we currently use via the getRuntime.exec() method. I know there is this sync() method and flush() as well. And based on this post I'm looking to do a sync and flush with all open file streams.

My issue is that I don't have direct knowledge of what file streams and descriptors are out there. I thought one way around this would be to check the /proc/(jvm process number)/fd folder but I can't find a good way to reliably get the JVM process number using pure java. I thought I might be able to get all objects of a certain class (the FileDescriptor class) but from what I'm reading this isn't feasible either.

Does anyone have suggestions on how to duplicate a *nix sync() call in pure java?

Community
  • 1
  • 1
eharik
  • 155
  • 1
  • 8
  • 1
    It can't be the fork that creates the performance problems, it is file sync operation. If you do exactly that in an another way, you'll still have the same problem. – auselen Nov 12 '12 at 22:55
  • I also would suggest following link, it is from android developers but since it is Linux / Java implementations we are talking about - it contains related information. http://android-developers.blogspot.se/2010/12/saving-data-safely.html – auselen Nov 12 '12 at 22:57
  • @auselen - Sorry, my original post wasn't entirely clear in that regard. The problem is related to the fact that on application start up we book keep available memory with the assumption that the entire process might be forked. This essentially cuts our available memory in half which impacts overall performance. – eharik Nov 12 '12 at 23:35
  • 1
    @eharik - you simply don't need to do that do deal with `System.exec()`. The JVM will be using "vfork" rather than "fork", and a "vfork" syscall followed by an "exec" syscall does minimal copying. – Stephen C Nov 12 '12 at 23:46
  • @StephenC - Is this true? There seems to be some [conflicting](http://www.bryanmarty.com/2012/01/forking-jvm/) [information](http://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error-12-cannot-allocate-memory-calling-run) out there on this issue. In the implementation of the JVM that we are using system calls do try and allocate the same amount of memory as the parent process. I just ran a simple test where I allocated over 50% of system memory to the java process and ran a system.exec() and the application crashed. – eharik Nov 13 '12 at 15:57

1 Answers1

4

What you are doing is more that a sync call. You are trying to do a "flush all file buffers and sync" operation. You would have trouble doing this in C / C++ too.

In addition to the problem of finding all of the open files (which you probably could solve ...), there is a bigger problem; i.e. whether it is the right time to flush the buffers.

Lets assume that your application is multi-threaded, and that one thread is responsible for calling sync. How does that thread know that other threads that are writing files have reached a consistent point wrt the files; i.e. that if the application was killed and restarted, that the (hypothetically) flushed files would contain a logically consistent state for the application? The answer is (most likely) that it doesn't know. So ... in fact ... the application is not in a significantly better position if it flushes before syncing.

And there is yet another problem. Assuming that thread A is responsible for flush / sync, and thread B is happily writing to some output stream. Consider this temporal sequence:

  1. Thread A flushes file
  2. Thread B writes to file
  3. Thread A calls sync

The only way to avoid this is to have thread A synchronize and block all other threads that are writing to files ... before it does the flush(es) and the sync.

My advice would be to just do the sync, and forget about the flushes. Deal with the problem of inconsistent files the classic way (by having the application write to a temporary file, and do an atomic rename), or by having the sync thread coordinate with the thread(s) writing the file ... so that it only "syncs" when the critical files are consistent.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Thanks for the input. Based on your response and some discussion with folks on my team I think that the calling thread will need to supply the file stream object and be responsible for calling sync when appropriate. A file system wide sync is wasteful from a resource perspective and problematic from an I/O integrity standpoint as you've pointed out. – eharik Nov 12 '12 at 23:33