1

I am actually working on a software that requires to read text files with some features that won't be explained here. While testing my code, I've found an anomaly which seems to come from the implementation of str.split("\r\n"), where str is a substring of the file's content.

When my substring ends with a succession of "\r\n" (several line breaks), the method completely neglects this part. For example, if I work with the following string:

"\r\nLine 1\r\n\r\nLine 2\r\n\r\n"

, I would like to get the following array;

["", "Line 1", "", "Line 2", "", ""]

, but it returns:

["", "Line 1", "", "Line 2"]

The String.split() Javadoc only notifies this without explaining:

... Trailing empty strings are therefore not included in the resulting array.

I cannot understand this asymmetry; why did they neglect empty string at the end, but not at the beginning?

0009laH
  • 1,960
  • 13
  • 27
  • 1
    Does this answer your question? [Java String split removed empty values](https://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values) – Matt U Dec 19 '19 at 22:32
  • You can select an answer even after closing. I don't think you can be part of a duplicate chain unless you do :) – Mad Physicist Dec 19 '19 at 22:54

2 Answers2

2

The Javadocs explain why it works the way it does; you'd have to ask them why they chose this default implementation. Why not just call split(regex, n) as per the docs? Using -1 does what you say you want, just like the docs imply.

class Main {
  public static void main(String[] args) {
    String   s = "\r\nLine 1\r\n\r\nLine 2\r\n\r\n";
    String[] r = s.split("\\r\\n", -1);

    for (int i = 0; i < r.length; i++) {
      System.out.println("i: " + i + " = \"" + r[i] + "\"");
    }
  }
}

Produces:

i: 0 = ""
i: 1 = "Line 1"
i: 2 = ""
i: 3 = "Line 2"
i: 4 = ""
i: 5 = ""
Dave Newton
  • 158,873
  • 26
  • 254
  • 302
1

You missed the part of the doc that explains the therefore, which states:

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero.

Looking at the referenced two-arg doc shows

If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

So this is just not the special case you want. Call with a negative integer instead:

str.split("\r\n", -1)

It's unclear why the authors thought 0 would be a more popular use-case than -1, but it doesn't really matter since the option you want exists.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • Thank you for pointing the other method :) Nonetheless I don't understand why did they choose to remove trailing empty spaces by default instead of leading spaces. – 0009laH Dec 19 '19 at 22:36
  • 1
    Well, leading spaces are much more likely to be significant if you're parsing a CSV, for example. What I don't understand is why they didn't keep the trailing ones by default. – Mad Physicist Dec 19 '19 at 22:38