3

I want to implement an ApplicationChangeMonitor which monitors the filesystem for changes in the currently executed jar file. When a change is detected the application should restart. I'm using a WatchService to detect the changes.

The setup:

  • Development in (Windows) Eclipse with a workspace on a samba share (Linux system)
  • The jar file is generated by Eclipse maven (m2e) on that samba share
  • The jar file is executed from shell on the Linux system (using openjdk)

So everytime a new jar file is created, the running application should be restarted on the Linux system. First I tried to make the application restart itself, but most of the times I ran into fatal errors from the JVM. Then I chose a simpler approach: I just made the application end itself after a change is detected and implemented the restart mechanism with bash:

while true ; do java -jar application.jar ; done

The weird thing is, I still get fatal errors once or twice after an application change. Example:

  • java -jar application.jar <-- initial start, application is running
  • New jar file created
  • java -jar application.jar <-- fatal error
  • java -jar application.jar <-- fatal error
  • java -jar application.jar <-- application starts
  • New jar file created
  • java -jar application.jar <-- fatal error
  • java -jar application.jar <-- application starts

The output:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00007f46d5e2416d, pid=28351, tid=139942266005248
#
# JRE version: 7.0_25-b30
# Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libzip.so+0x516d]  Java_java_util_zip_ZipFile_getZipMessage+0x114d
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/workspace/.../target/hs_err_pid28351.log
#
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
#   http://icedtea.classpath.org/bugzilla
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

OpenJDK creates dump files and I guess the relevant part is the stack trace leading to this fatal error:

 - Stack: [0x00007fbc9398f000,0x00007fbc93a90000],  sp=0x00007fbc93a8bd90,  free space=1011k
 - Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
 - C  [libzip.so+0x516d]  Java_java_util_zip_ZipFile_getZipMessage+0x114d
 - C  [libzip.so+0x5eb0]  ZIP_GetEntry+0xd0
 - C  [libzip.so+0x3af3]  Java_java_util_zip_ZipFile_getEntry+0xb3
 - j  java.util.zip.ZipFile.getEntry(J[BZ)J+0
 - j  java.util.zip.ZipFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry;+38
 - j  java.util.jar.JarFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry;+2
 - j  java.util.jar.JarFile.getJarEntry(Ljava/lang/String;)Ljava/util/jar/JarEntry;+2
 - j  sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+48
 - j  sun.misc.URLClassPath.getResource(Ljava/lang/String;Z)Lsun/misc/Resource;+53
 - j  java.net.URLClassLoader$1.run()Ljava/lang/Class;+26
 - j  java.net.URLClassLoader$1.run()Ljava/lang/Object;+1
 - ...

Now, does anyone have any idea why I get these fatal errors? I thought maybe it's because the jar file hasn't been written completely (that would explain why the problem comes from Java_java_util_zip_ZipFile_getZipMessage). But that's simply not the case because the md5sum of the jar stays the same after executions resulting in fatal errors and working executions.

while true; do md5sum application.jar ; java -jar application.jar ; done
steffen
  • 16,138
  • 4
  • 42
  • 81
  • If the jar isn't entirely written I get (``Error: Invalid or corrupt jarfile ``). If it is I sometimes get ``Failed to instantiate SLF4J LoggerFactory`` (``java.lang.NoClassDefFoundError: ch/qos/logback/core/joran/spi/JoranException``), sometimes different classes are missing, and sometimes the fatal errors (``A fatal error has been detected by the Java Runtime Environment:``). After waiting a couple of seconds I can run the application without problems. – steffen Oct 18 '13 at 12:55
  • Did you try exiting Java after unlocking file, as suggested in my answer? – nullptr Oct 18 '13 at 13:08
  • And also remove checking checksum from bash loop and try per my answer. – nullptr Oct 18 '13 at 13:13
  • To your question: "First I tried to make the application restart itself" how did you restart java process itself? – nullptr Oct 18 '13 at 13:15
  • If you are using other java process to restart this java process, I will give you another solution! – nullptr Oct 18 '13 at 13:15
  • @Meraman No, there's only one application. I tried your suggestion, but I get the same problem. I think it can't work, because either (Windows) the application can't acquire a lock on itself because the file is already open (being executed), or (Linux) the Lock will be granted immediately. Try it, it does not work. Here's how to restart the java application: http://stackoverflow.com/a/4194224/1296402 – steffen Oct 18 '13 at 14:13
  • Did you try with "while true; java -jar application.jar ; done" with solution? I mean removing checksum check. – nullptr Oct 18 '13 at 14:20
  • Why would I remove that? It's just a Linux command. The only effect I see is a higher probability that everything works since its execution also needs a fractional second of time that passes before the jar is started again. – steffen Oct 18 '13 at 14:31
  • Yes, but may be file lock has not released by system as soon as checksum command finishes and java starts. I mean file may be locked by checksum command. – nullptr Oct 18 '13 at 14:49
  • That's more than unlikely. E. g., I can run both commands (``md5um`` and ``java -jar``) as often as I want after a couple of seconds after compilation without problems. Remember, I get a fatal error and ``ClassNotFoundException``, not something like ``Couldn't open file`` or so. – steffen Oct 18 '13 at 19:06
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/39520/discussion-between-meraman-and-steffen) – nullptr Oct 18 '13 at 19:07
  • Did you try with Oracle JDK? – nullptr Oct 18 '13 at 19:16

1 Answers1

2

This is because you are getting notified for new file, while that file is being written to disk. This is bad thing with WatchService, it will notify you as soon as new file gets created but not yet written completely to disk.

When new jar file is being written to disk, the jar file is locked by process, which is writing that jar file to disk. You can't access file till file creator process has not unlocked file.

The solution for this: You have to try to open file, if file gets opened then file has been written to disk completely. If you fail to open file, then wait for some time(or dont wait, try next), and try next to open file.

To unlock file, implement something like this:

public void unlockFile(String jarFileName){
    FileInputStream fis = null;
    while(true){
        try{
            // try to open file
            fis = new FileInputStream(jarFileName);
            // you succeed to open file
            // return
            // file will be closed in finally block, as it will always executed
            return;
        }catch(Exception e){
            // file is still locked
            // you may sleep for sometime to let other process finish with file and
            // file gets unlocked

            // if you dont have problem with this process utilizing CPU, dont sleep!
            try{
                Thread.sleep(100);
            }catch(InterruptedException ie){
            }
        }finally{
            if(fis != null){
                try{
                    fis.close();
                }catch(Exception e){
                }
            }
        }
    }

To your problem,

You told: "I just made the application end itself after a change is detected and implemented the restart mechanism with bash"

So, before you end java process, unlock file as suggested by me in above method. I am sure errors will go away. Just try and let me know results.

Something like this:

void shutDownMethod(){
    // get file name from watcher, below line will depend on your logic and code.
    String jarFileName = watcherThread.getNewNotifiedFile();
    // unlock new jar file.
    unlockFile(jarFileName);
    // shutdown JVM
    System.exit(0);
    // bash will restart JVM
}
nullptr
  • 3,320
  • 7
  • 35
  • 68
  • That's what I thought (see last paragraph). But (1) I buffer the changes and take actions after a 500ms delay after the last change (so the ``WatchService`` doesn't give any more change events), (2) why would md5sum print out the same sum (wether it's working or not) then, (3) wouldn't java give an error like "Invalid or corrupt jarfile" instead of a fatal exit? [I implemented (1) because the process actually could acquire the lock but then Eclipse gave error messages. Maybe these phenomenons occured due to samba-specifics.] – steffen Oct 18 '13 at 12:21
  • Really I didn't get what you are trying, in which process you are checking checksum? – nullptr Oct 18 '13 at 12:53
  • You are getting checksum right, because that is not java process! – nullptr Oct 18 '13 at 12:57
  • Look at the last line: I calculate the checksum in bash, *after* the application has ended and *before* it is restarted. And now there are usually fatal errors after the first one or two tries, and after that the jar runs without problems. All three times the jar has the same checksum, calculated from shell and *before* running the jar => the file *must* have been complete all three times! – steffen Oct 18 '13 at 12:59