Scanner's buffer not representative of entire file when newlines enabled

Question

Basically, what I'm doing is writing to a file and then reading it later. A few times I was looking at the buffer, and seeing lines 'cut off,' and getting concerned it was a flushing issue. However, I stumbled across this question, which states:

So, it appears scanner does not read the entire file at once...it reads file by buffer - which means in chunks.

and I see that reflected in my scanner. Looking at the buffer size, I see 1024 as the size.

However! I was writing each entry as a separate line, passing in the message and appending \n to it before writing. Taking that \n away then results in something interesting. When running without the newlines, I find that the buffer size has magically increased to something interesting like 5,232, and I can now see the entire contents of the file in the buffer!

The way I make the Scanner is simply a new Scanner(new FileInputStream("path.txt")), and then inspect it using Intellij's variable inspection (that's where I got the idea of cutting off from, I wasn't able to see everything in the file)

Essentially, my question is: why does adding the newlines force the buffer to be a fixed size and obey the rules, and not adding newlines (meaning the entire file is just one line) lets the buffer be whatever size it needs to be?

Can you please provide some code as to how are you creating the `Scanner`? are you using any regex or are you going `nextLine()` etc? — ring bearer, Jul 01 '15 at 16:43
@ringbearer I create the scanner from a `FileInputStream` and then inspect it using Intellij's inspection — Jeeter, Jul 01 '15 at 16:47

score 2 · Accepted Answer · edited May 23 '17 at 12:30

I suggest, if you only want to read a file, to use the BufferedReader instead of a Scanner. See this Stack Overflow post for more information.

To answer your question: You're right, the default Scanner buffer size is 1024 (as seen here).

Your problem with the larger scanner buffer is a result of the fact that the scanner always reads the longest line into its buffer, even if it is longer than the default 1024 bytes. The removing of all \n in your file made the scanner think there is only one really long line, which it has to buffer.

As you can see here, the buffer size has nearly no effect on efficiency when you are reading files.

Hope I could help you

Scanner's buffer not representative of entire file when newlines enabled

1 Answers1