4

I have a Java process which reads a given file using the Java RandomAccessFile and does some processing based on the file contents. This file is a log file which gets updated by another java process. The java process which reads the file is on another machine and has a NFS mount setup to access the file in the remote server. Basically the process which reads the file will poll for changes in the file based on the file length and position of the RandomAccessFile and call a handlers method for each byte encountered. The issue is that i am sometimes getting ASCII 'NUL' characters returned from RandomAccessFile read method

int charInt = read();

that is, charInt returning 0 on some occasions and after some time it returns valid characters. But then i am missing the characters during the stream is reading in NULs

I tried using http://commons.apache.org/io/apidocs/org/apache/commons/io/input/Tailer.html where i get notified of each line. but then in these lines i sometimes notice the ASCII NUL characters. I have also gone thru trail in Java IO implementation of unix/linux "tail -f" - my java process is something similar, but then i am starting to think the issue is with the NFS mount or some buggy java IO when trying to read from a NFS mount. I carried out some testing reading from a normal file (which is not in a NFS mount) and having a process which continuously writes to it. All these tests were succesfull. I also tried java BufferedReader since the file stream is really a character stream even though i can treat it as a byte stream. Still i am getting the NUL characters.

not sure whether this will matter - the NFS mount is a readonly (ro) one. Appreciate any help on this. Thanks.

I tried the following as well:

FileWriter fileWriter;
    try {
        fileWriter = new FileWriter("<OUT_FILE>", true);
    } catch (IOException e) {
        throw new RuntimeException("Exception while creating file to write sent messages ", e);
    }
    BufferedWriter bufWriter = new BufferedWriter(fileWriter);

    Runtime r = Runtime.getRuntime();
    Process p = r.exec("tail -f <PATH_TO_IN_FILE>");
    Scanner s = new Scanner(p.getInputStream());
    while (s.hasNextLine()) {     
        String line = s.nextLine(); 
        bufWriter.write(line);
        bufWriter.write(System.getProperty("line.separator"));
        bufWriter.flush();

    }
    bufWriter.close();                               

and still i am getting the NUL characters. Here i am writing the lines i read to a file so that then i can compare the the IN file and the OUT file. I see on one occassions lines are skipped (with NUL characters). all other lines compare fine - so from about 13000 lines, we see a mismatch in about 100 lines. Also another strange thing is that I had a less running and i can see the NUL characters here as well ,, there are basically in the form of ^C^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ and then valid lines. one more thing i noticed during the time the lines were missed , the file was getting updated very quickly by the writing process, so basically an xml message was written to the file at 20110729 13:44:06.070097 and then the next one at 20110729 13:44:06.100007. lines were missed from this second xml message. more findings : the file path where we are reading the files off are in a shared NAS.

Community
  • 1
  • 1
gregoryp
  • 41
  • 1
  • 3

2 Answers2

7

I realize this question is now more than a year old, but I will add what I know to it, in case others with this issue stumble across it as I have.

The NUL characters described in this question appear due to asynchronous writes to the file being read from. More specifically, packets of data from the remote file writer have arrived out of order, and the NAS buffer has committed a later packet and padded the area for the unreceived data with NUL characters. When the missing packet is received, the NAS buffer commits it, overwriting those null characters.

In the application where we first encountered this, we are reading a file line by line, and keeping track of the last line number successfully read (so we can stop at any time and start up again where we left off). Our interim solution for handling this is simply to check specifically for the "\0" on every read and, when it is encountered, close the file, wait 1 second and reopen the file, queuing up to where we left off. Usually, by the time we read the line again, the actual text has been committed.

While closing and reopening the file may seem dramatic, recovering without doing this is problematic. You can't Mark/Reset the BufferedReader to resolve it, because once characters are read into the reader's Buffer they will not be reread from the file, only regurgitated every time you try and read again.

Getting the underlying FileChannel, and reading and setting position() also fails because your position in the file includes characters read into the buffer that you may not have seen yet, and you will end up skipping that unseen data.

We are testing a solution where we have extended the InputStreamReader class and overwritten the read(char[], int, int) method to use the filechannel to get the position before each read, call the superclass's read method, check for \0 and reset the filechannel position if it is found, returning 0 as the number of characters read.

Gregg Dale
  • 71
  • 1
  • 3
0

Did you try something like this:

  BufferedReader input = new BufferedReader(new FileReader(args[0]));
  String currentLine = null;

  while (true) {

    if ((currentLine = input.readLine()) != null) {
      System.out.println(currentLine);
      continue;
    }
    try {
      Thread.sleep(sleepTime);
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
      break;
    }
   }

If nothing can be read from the file currentLine will be null ...

I doubt there is a specific NFS + Java problem, the fact that you access a file via NFS should be unknown to the VM.

Angel O'Sphere
  • 2,642
  • 20
  • 18
  • thanks. yes, i have tried that - the issue is that on some occasions i get NUL characters in the lines printed out when i expect to have valid characters. Also, sometimes the lines returned are getting big because the stream is unable to read the LF or CR characters, and instead reading off NUL characters. – gregoryp Jul 26 '11 at 11:26
  • Then it coulod be an encoding problem, if you open the file you can say if it is e.g. utf-8 encoded, also if it is really needed you can set line endings. The problem might come from the fact that the operating system your java code is running on is different from the one where is written to the NFS mounted file. – Angel O'Sphere Jul 26 '11 at 11:36
  • I dont have control of the process which writes to the file. The operating systems are same - just checked. Not sure what encoding the JVM uses tho. I believe it should be there in some system property. i was not able to reproduce the issue when the writing process was running in the same machine as the reading process. But the issue came up when the writing was done in a remote server. Also, the frquency the file was updated/written to was same under both scenarios – gregoryp Jul 26 '11 at 13:26