0

Is there some way to open a file on unix for writting that is blocking (that actually wait for the data to be in the file before completing the "write" operation ?

i'm having a program that takes result lines from a list and writes them in a file.

My current implementation is as follows (i removed the processing done on each line)

    ArrayList<String> lines = new ArrayList<>();
    BufferedReader in = new BufferedReader(new FileReader(inFile));
    String line;
    while((line = in.readLine()) != null){
      lines.add(line);
    }
    int written = 0;
    PrintWriter out = new PrintWriter(new FileWriter(outFile));
    for(String l : lines){
      written++;
      out.println(l);
      if(written % 10000 == 0)
        Message.info("lines written "+written);
    }
    out.close();

The current behaviour is the following :
[17:12:43 INFO ] lines written 10000
[17:12:43 INFO ] lines written 20000
[17:12:43 INFO ] lines written 30000
[17:12:44 INFO ] lines written 40000
[17:12:44 INFO ] lines written 50000
[17:12:44 INFO ] lines written 60000
[17:12:45 INFO ] lines written 70000
[17:12:45 INFO ] lines written 80000
[17:12:45 INFO ] lines written 90000
[17:12:46 INFO ] lines written 100000

Where the program runs very fast, but at the end, waits (30-40secondes) for all the data (4Go) to be written to the file before ending. (i can see the file growing on the disk during this time)
When what i want, is for my INFO message to be displayed only when the data are really in the file : the program will seem slower, but will end immediatly after the last message.

I tried using BufferedWriter (from new BufferedWriter() and Files.newBufferedWriter()), FileOutputStream, OutputStream, but all of those seem to propose only non-blocking IO operation

The program is run on Ubuntu (I read that the implementation is filesystem dependant)

So, is there some way to wait for a println()/write() operation to be complete before execution the next line of code ?

I "feel" that java as delegated the actual writing to the OS, and doesn't wait for the data to be on the disk to go on, but since java waits for the last write to be complete before exiting, there must be a way the wait after each line

  • 1
    Those are blocking operations, it's not a blocking/non-blocking issue. There are buffers to make things faster, but if you call `out.flush()` every once in a while they should be flushed earlier to disk. That will make things slower though. – Kayaman Sep 19 '19 at 16:16
  • No, adding flush() after each println() doesn't "space" the progress of the program. It is my understanding that println() itself is autoflushing anyway. I think the writting a "over" for java, and flushed to linux. – Thomas Ludwig Sep 20 '19 at 07:12
  • 1
    Really? Putting `flush()` after every `out.println()` should've at least made your program slow as a snail. `PrintWriter` only autoflushes if you enable it in the constructor (which you haven't done). – Kayaman Sep 20 '19 at 07:15
  • i tried both, with/without autoflush, with/without adding explicit flush() and y a have the same running time (36s to read the data from the source), 15s to finish the writting, and 34 to wait for the program to end. Again, it think that java get's and acknoledgement of wrtting from the OS and believes the data to be written, and then the close() operation take a long time – Thomas Ludwig Sep 20 '19 at 07:54
  • I found a dirty workaround. Every 10000 lines, I close the file and open it again in append mode. This has the excpeted behaviour with only an acceptable time overhead – Thomas Ludwig Sep 20 '19 at 08:08
  • Have you considered streaming the lines instead of loading them all in the memory first? – Kayaman Sep 20 '19 at 08:13
  • it's a simplification of my program. In reality I read the line, create objects from them, do some multi-threaded processing on them, and serialize the results as "other" lines (analysis results) – Thomas Ludwig Sep 20 '19 at 12:26
  • Well, you might want to have a look at https://stackoverflow.com/questions/730521/really-force-file-sync-flush-in-java but wouldn't you be better off optimizing the performance instead of trying to control when data is flushed to disk? For example, it still doesn't sound like you need to read the data fully before starting processing, so you could mitigate the IO times by doing them concurrently. In fact, you're using a raw `PrintWriter` with `FileWriter` without any buffering. Consider throwing a `BufferedWriter` in between, and then calling `flush()` every now and then. – Kayaman Sep 20 '19 at 12:40
  • In my real life application, I don't read fully the input file at once. I have 1 Reader Thread, to convert each line into an Object, N Worker Threads to process them, and 1 Consumer to serialize the results. So performance-wise, everything runs (too) quick. It's just that I have a long wait-time after the Consumer exits the ThreadPool, so I wanted to sync it with the real I/O : so the program would end at the same time as the last progress message is printed... But your link seems interesting, Thanks ! – Thomas Ludwig Sep 20 '19 at 13:29

1 Answers1

0

The dirty workaround : Every 10,000 lines, I close the file and open it again using StanddardOpenOption.APPEND.
This has the expected behaviour with only an acceptable time overhead.

The clean solution : open the file with `StanddardOpenOption.SYNC.
This has the expected behaviour with a STRONG time overhead