4

In my java code i'm reading a NFS mounted directory (code runs on NFS client machine). Everything's fine as long as NFS server machine is up and running but when the NFS server is down (for any reason), the code hangs anywhere that creating new File to nfs mounted directory. If i simply umount the nfs directory, my code runs with no problem, but i don't want to manually check for such a problems every day and wanted to handle this scenario only in my code

this is /etc/exports of NFS Server:

/var/nfs/general *(rw,insecure,all_squash,no_subtree_check)

The actual java code is simply:

log.info("before new");
File file = new File("/var/nfs/general");
log.info("after new");

It only prints "before new" in log file and never reaches the "after new"

I put the new File in Executor service with timeout like what this suggested but still hangs even with 2 seconds timeout:

How do I call some blocking method with a timeout in Java?

  • OS: ubuntu server 16.04 on both servers (NFS client and server)

    Java version: 1.8_172

f.ald
  • 320
  • 1
  • 16
  • You might want to make a soft mount of the NFS drive on the client side, which will avoid hanging until the server is back up when it is called. – Aaron Jan 15 '19 at 09:43
  • Similar question (with similar answer) on the ServerFault SE : https://serverfault.com/questions/710391/nfs-server-hung-threads – Aaron Jan 15 '19 at 09:47
  • soft mount didn't helped either, still hangs – f.ald Jan 15 '19 at 09:58
  • Hmm, there's a `timeo=` value you can specify on the NFS mount when you set it to soft, did you define it? Otherwise the default seems to be 1 minute, which might be enough to let you think it still hangs indefinitely. Could you add the relevant line from your `/etc/fstab` to your question if you're still stuck? Edit : in addition there can be a `retrans=n` (default n=3) option which multiplies the time to wait for a read error (timeout is for one read failure, retries up to n read failures) – Aaron Jan 15 '19 at 10:03
  • in my testing environment i haven't used /etc/fstab, only used mount command with -o soft,intr – f.ald Jan 15 '19 at 10:08
  • the mount command should result in modifications to the /etc/fstab file, I didn't mean you should edit it yourself. Check `man nfs` for more info – Aaron Jan 15 '19 at 10:26
  • thanks @Aaron, timeo=1 did the trick, so now using mount -o soft,inttr,timeo=1, i checked this also with stopping nfs server service and it was OK too. Also i was worried about some scenario's that nfs client hangs too, so i was insisting on catch the error or time-out in my code, but anyway so far so good – f.ald Jan 15 '19 at 10:31
  • It's a bug in the NFS implementation. It's not java specific. There is no fix. – rustyx Jan 15 '19 at 11:11
  • @rustyx my problem with java is that it blocks the calling method forever and even a ExecuteService with timeout could not help it – f.ald Jan 15 '19 at 11:53

1 Answers1

0

You can simply wrap it under a Callable and use get() with a timeout. Below is a sample code which can timeout after 20 seconds if result(File here) is not available!

FutureTask<File> futureFile = new  FutureTask<File>(new Callable<File>(){
    public File call() throws Exception {
        return new File(filePath);
    }
});

futureFile.get(20, TimeUnit.SECONDS);

I've made the timeout to 3 ns for testing purpose. File would surely be timed-out in this case.

public static File readTimeOut(final String filePath, int timeOutInMillis) throws InterruptedException, ExecutionException, TimeoutException {
        ExecutorService executor = Executors.newFixedThreadPool(1);
        FutureTask<File> futureFile = new FutureTask<File>(new Callable<File>() {

            public File call() throws Exception {
                System.out.println("I am called");
                return new File("/usr/mohamed");
            }
    });
    executor.execute(futureFile);
    return futureFile.get(3, TimeUnit.NANOSECONDS);
}

(Kept everything simple. Resources aren't closed properly!) But this can surely be handled with FutureTask.

Mohamed Anees A
  • 4,119
  • 1
  • 22
  • 35
  • Can you please post the stack-trace in such case with my solution and/or debug with your application to provide more insight on what is happening? I'm pretty sure FutureTask and get() should work seamlessly. – Mohamed Anees A Jan 15 '19 at 10:31
  • I expected that should work too, but it simply did not move to next line after get, no error or exception, just hangs. tested it with both Run and Debug on Intelij – f.ald Jan 15 '19 at 10:33
  • That is impossible. Are you sure you submitted the FutureTask to any ExecutorService and execute() method is called? Please refer to my second edit. – Mohamed Anees A Jan 15 '19 at 10:57
  • when calling readTimeOut on Evaluate Expression of Intelij (Debug mode) it hangs and shows that result = Collecting Data, but the method is ok with non-remote Directories – f.ald Jan 15 '19 at 11:51
  • @f.ald is half correct. The FutureTask returns correctly, but the executor is stuck forever. Afaik there is no solution for this, since the underlying linux is infinitely blocking. For example you will get the same result if you `ls` the share directory. – wlfbck Jan 29 '21 at 08:25