0

I am using BufferedWriter to write text to files in Java. However, I am providing the custom buffer size in the constructor. The thing is, it is writing to the file in chunks of whatever the size I am giving (for example, if I give the buffer size as 8KB, the files are written once for 8KB). But, when I look at the memory occupied by the bufferedwriter object (using YourKit profiler), it is actually twice the given buffer size (16KB in this case).

I tried to look at the internal implementation to see why this is happening, I see that it is creating a char array with the given size. And when it writes to the array, it makes sense that it occupies twice the buffer size as each char occupies 2 bytes.

My question is, how is BufferedWriter managing to write only 8KB in this case, where it is storing 16KB in the buffer. And is this technically correct? Because each time, it is flushing only 8KB (half) even though it has 16KB in buffer.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
Ravi
  • 879
  • 2
  • 9
  • 23

3 Answers3

1

But I expected all the chars stored in the char array to be written to the file when it reaches the buffer size (which would be 16 KB in my given example).

8K of chars occupies 16 KB of memory. Correct.

Now lets assume that the chars are actually all in the ASCII subset.

When you write a character stream to an output file in Java, the characters are encoded as a byte stream according to some encoding scheme. (This encoding is performed by stuff inside the OutputStreamWriter class, for example.)

When you encode those 8K of characters using an 8 bit character set / encoding scheme such as ASCII or Latin-1 ... or to UTF-8 (!!) ... each character is encoded as 1 byte. Therefore flushing a buffer containing those 8K characters generates an 8K byte write.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
0

The size of BufferedWriter is the char array size.

public BufferedWriter(Writer out, int sz) {
    super(out);
    if (sz <= 0)
        throw new IllegalArgumentException("Buffer size <= 0");
    this.out = out;
    cb = new char[sz];
    nChars = sz;
    nextChar = 0;

    lineSeparator = java.security.AccessController.doPrivileged(
        new sun.security.action.GetPropertyAction("line.separator"));
}

A single char is not equal to a single byte. It is all defined by your character encoding.

Therefore, to execute the task exactly as what you described, you have to switch to another class: BufferedOutputStream, which the internal buffer is exactly counted by number of bytes.

public BufferedOutputStream(OutputStream out, int size) {
    super(out);
    if (size <= 0) {
        throw new IllegalArgumentException("Buffer size <= 0");
    }
    buf = new byte[size];
}
Alex
  • 803
  • 4
  • 9
  • Thanks Alex.I understand single char is not a single byte. But I expected all the chars stored in the char array to be written to the file when it reaches the buffer size (which would be 16 KB in my given example, since the char array is occupying 16 KB). But why is that not happening? Why was it flushing only 8 KB to the file? Please help me understand – Ravi Sep 29 '17 at 07:42
  • 1
    It depends on your file encoding. Java uses _UTF-16_ as its internal char storage, which means for each character, it uses 2 bytes to store. However, if your characters are only ASCII and your file encoding is _UTF-8_, then every character will use only 1 byte in the file, and hence are the results. – Alex Sep 29 '17 at 07:48
  • Agreed with @Alex. See https://stackoverflow.com/questions/7019504/in-what-encoding-is-a-java-char-stored-in. You have 1) internal memory representation of chars 2) the char buffer inside the bufferedWriter, 3) The char to byte conversion (which itself probably also uses an internal buffer of bytes[]), and produces a variable number of bytes per char. This make up for a complicated memory usage prediction. Birds-eye-view, 2 to 4 times the size of the char buffer seems reasonnable. But in the end, 8 chars may end up writing 8 bytes to a file. Or 16. Or 10... – GPI Sep 29 '17 at 07:51
  • Thanks @Alex. I now understood that encoding is doing the trick – Ravi Sep 29 '17 at 07:59
0

It depends on the encoding used to write to the file: ISO-8859-1 store a character as a single byte, UTF-8 encodes all ASCII character as a single byte.

Maurice Perry
  • 9,261
  • 2
  • 12
  • 24