2

I can't find the documentation that specifies how a Scanner treats newline patterns by default. I want to read a file line by line and have the scanner be able to handle \r, \n or \r\n line endings regardless of the system the program is actually running on.

If I declare a scanner like so:

Scanner scanner = new Scanner(reader);

what is the default behaviour? Will it handle all three kinds as described above or do I have to tell it explicitly to do it?

skaffman
  • 398,947
  • 96
  • 818
  • 769
Anthony
  • 12,177
  • 9
  • 69
  • 105
  • Try it and see what happens. Create 3 different files each using a different EOL string. – camickr May 07 '11 at 03:29
  • 3
    Actually, this is a case where "try it and see" is not going to give you the full answer! – Stephen C May 07 '11 at 04:16
  • 1
    @Setphen C, The question asked if it handles "\r", "\n" or "\r\n". This is easily tested and verified. Yes the full answer is it also supports unicodes characters, but that was not the question. The proper question would have been "by testing the Scanner I've noticed it support \r, \n \r\n, does it support anything else?". The poster took the lazy way out and will never learn simple problem solving techniques if people keep spoonfeeding answers that are easily tested. – camickr May 07 '11 at 04:52
  • @Duracell, "A little testing showed that this is true.", which you should have done before posting the question! – camickr May 07 '11 at 04:53
  • @camickr: As Stephen C pointed out, try and see doesn't give the full answer. It's all well and good to see if it works on MY machine, but I needed a more definitive answer. – Anthony May 07 '11 at 04:55
  • @Duracell, Java is supposed to be cross platform so why wouldn't it work? Again if you where concerned, you question should have been, "I've tested it on my machine and it supports the 3 EOL strings as expected. Can anyone confirm if this will work on all platforms?". The point is to show you made an effort and you are asking for confirmation which is why you do the test "before", not "after" asking a question! You could also have looked at the source code, that all anybody else did. Nobody else ran the code on all different platforms. – camickr May 07 '11 at 05:16
  • @camrickr - actually ... if you read the question carefully ... he also asked *"what is the default"* behavior. As I said ... "try it and see" is NOT the full answer ... TO THE OP's QUESTION. Besides, it is a poor answer / approach in general. Running the code on different platforms would not have told you that there were three other line termination sequences. – Stephen C May 07 '11 at 05:19
  • The OP's requirement was to support the 3 different EOL strings that he was aware of. That is easy to test. Read the OP's reply to the answer he accepted which was posted by David. The fact that it supports other EOL strings could be positive or negative depending on the OP's requirements. I never said "try and see" was the full answer. Again, the point of my comment was for the OP to do some work himself BEFORE posting a question like this. Then you can list your results and ask for confimation or clarification if somehing doesn't work as you expect. – camickr May 07 '11 at 15:08
  • 1
    @camrickr: As far as I can see, camickr, your are not helping this discussion. Both the question by Duracell and the answer given by David/Stephen C are valid. Just trying it is not the solution, especially since Scanner does not clearly define what a line separator is - so it may be different between implementations of the Java runtime. – Maarten Bodewes Jun 15 '11 at 12:34

2 Answers2

5

Looking at the source code for Sun JDK 1.6, the pattern used is "\r\n|[\n\r\u2028\u2029\u0085]"

which says "\r\n" or any one of \r, \n or the unicode characters for "line separator", "paragraph separator", and "next line" respectively.

David
  • 1,481
  • 11
  • 19
  • Thanks, that's what I needed to know. A little testing showed that this is true. If I don't specify a delimiter and use `scanner.hasNext` then it will treat `\r`, `\n` and `\r\n` as line endings. – Anthony May 07 '11 at 04:13
3

It is not documented (in Java 1.6) but the JDK code uses this regex to match a line break:

"\r\n|[\n\r\u2028\u2029\u0085]"

Here's a link to the source code: http://cr.openjdk.java.net/~briangoetz/7012540/webrev/src/share/classes/java/util/Scanner.java.html

IMO, this ought to be specified, since Scanner's behavior wrt to line separators is different to (for example) BufferedReader's. (I've lodged a bug report ...)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • This wouldn't happen to be Stephen Crawley, would it? – Anthony May 07 '11 at 04:27
  • Phew. I am currently taking a software design course in Java with a lecturer by the name of Stephen C. Got me excited for a second. – Anthony May 07 '11 at 04:36
  • Could you point to the bug report? I cannot find it in the Oracle bug database. I think it should certainly be in the API since it doesn't use the platform line separator. – Maarten Bodewes Jun 15 '11 at 12:52