1

I am writing lots of strings out to a file and noticed that at some point, write operations starting to take more time to perform than before. Most of the strings are unique and are generated at run-time using a StringBuilder, so I thought that was the issue, but turns out there's other reasons.

I wrote a quick program to see what's going on

public static void main(String[] args) { 
    long time, t1, t2;
    int n = 10000;
    int threshold = 10;
    try {
        BufferedWriter out = new BufferedWriter(new FileWriter("C:\\temp\\out.txt"));

        for (int i = 0; i < n;i++) {

            t1 = System.currentTimeMillis();
            out.write("test\r\n"));
            t2 = System.currentTimeMillis();
            time = t2 - t1;
            if (time > threshold) {
                System.out.println(time);
            }
        }
        out.close();
    } catch(Exception e) { 
       e.printStackTrace();
    }

}

I put in a threshold to filter out write operations that take minimal time. I set it to 10 milliseconds.

When n = 10 000, nothing is printed out for me, which means the writes are fast. As I increase n to 100 000, 1 000 000, 10 000 000, a couple numbers are printed out. Then at 100 000 000 I start seeing lots of numbers being printed out. Increasing it to 1 000 000 000, a lot of write operations are taking several tens to hundreds of milliseconds, which greatly slows down throughput.

There are likely many different reasons why this happens like me using a spinning disk drive or disk fragmentation. I've tried increasing the buffer size to 1 MB or 10 MB but it didn't seem to help (in fact it seemed to be making things worse).

Is there anything I can do to avoid this sudden drop in throughput overtime?

MxLDevs
  • 19,048
  • 36
  • 123
  • 194
  • Using a `StringBuffer` or a `StringBuilder`? – tilpner May 21 '14 at 18:39
  • 1
    I don't think logging the number of individual write operations that take more than 10ms is a good way of measuring throughput, especially when there's buffering involved. Better to measure the average write speed over time. – Alex May 21 '14 at 18:41
  • @Alex do you mean that, because of buffering, some writes would take no time at all since it's just writing to memory, while other writes would take more time because the buffer's full and needs to be flushed to disk? – MxLDevs May 21 '14 at 18:42
  • Yes, exactly. Take a look at the BufferedWriter source code. – Alex May 21 '14 at 18:49
  • Will try these suggestions http://stackoverflow.com/questions/1605332/java-nio-filechannel-versus-fileoutputstream-performance-usefulness, but then I am writing as UTF-8 so binary writing might cause an issue. – MxLDevs May 21 '14 at 19:03

2 Answers2

3

Most operating systems e.g. Windows and Linux, allow you to have uncommited writes to disk. Eg. you can write up to 10% of your main memory ahead of what is actually on disk. This works very fast, however once this threshold is reach you can only write at the speed your disk can write.

Is there anything I can do to avoid this sudden drop in throughput overtime?

  • increase the dirty threshold, in Linux this is fairly easy.
  • increase the amount of memory you have.
  • increase the disk transfer rate
  • write to a compressed file, to reduce the amount of data written.
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
1

When writing small files using BufferedWriter, the buffer is not flushed to disk until the writer is closed, which is not measured at all by your benchmark. Most writes will operate on an in-memory buffer, which will be very fast. To get a better picture of performance, you'd want to start your timer before the very first write, and stop it after the call to close() completes, and divide it by total size of the written data to get a measure of average throughput.

Alex
  • 13,811
  • 1
  • 37
  • 50