If you use the following regex:(?<=\\r\\n|\\r(?!\\n)|\\n)
to split your string it will work as intended.
What is happening with your regex is that when \r\n
is encountered, the lookbehind assertion will be true (?<=\r)
and it will split the string just after \r
.
This is why I have added a negative lookahead (?!\n)
after \r
to enforce that the character after \r
is not \n
. This will prevent the split between \r
and \n
and keep it as a whole.
Demo: https://regex101.com/r/H6PNmY/1/ (where I have replaced \r
by a
and \n
by b
for readability)
When you put this back in your code:
String input = "1 dog \r\n 2 cat, 1 car \r 2 planes, 1 apple \n 2 peaches";
String[] output = input.split("(?<=\\r\\n|\\r(?!\\n)|\\n)");
for(int i=0; i<output.length; i++)
{
printASCII(output[i]);
System.out.println("===");
}
with printASCII
defined as:
public static void printASCII(String in)
{
for(int i=0; i<in.length(); i++)
System.out.println("The ASCII value of " + in.charAt(i) + " = " + (int)in.charAt(i) );
}
It gives you the following output:
The ASCII value of 1 = 49
The ASCII value of = 32
The ASCII value of d = 100
The ASCII value of o = 111
The ASCII value of g = 103
The ASCII value of = 32
The ASCII value of
= 13
The ASCII value of
= 10
===
The ASCII value of = 32
The ASCII value of 2 = 50
The ASCII value of = 32
The ASCII value of c = 99
The ASCII value of a = 97
The ASCII value of t = 116
The ASCII value of , = 44
The ASCII value of = 32
The ASCII value of 1 = 49
The ASCII value of = 32
The ASCII value of c = 99
The ASCII value of a = 97
The ASCII value of r = 114
The ASCII value of = 32
The ASCII value of
= 13
===
The ASCII value of = 32
The ASCII value of 2 = 50
The ASCII value of = 32
The ASCII value of p = 112
The ASCII value of l = 108
The ASCII value of a = 97
The ASCII value of n = 110
The ASCII value of e = 101
The ASCII value of s = 115
The ASCII value of , = 44
The ASCII value of = 32
The ASCII value of 1 = 49
The ASCII value of = 32
The ASCII value of a = 97
The ASCII value of p = 112
The ASCII value of p = 112
The ASCII value of l = 108
The ASCII value of e = 101
The ASCII value of = 32
The ASCII value of
= 10
===
The ASCII value of = 32
The ASCII value of 2 = 50
The ASCII value of = 32
The ASCII value of p = 112
The ASCII value of e = 101
The ASCII value of a = 97
The ASCII value of c = 99
The ASCII value of h = 104
The ASCII value of e = 101
The ASCII value of s = 115
===
That shows that the EOL characters are properly kept as you have requested.
ASCII table: https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.networkcomm/conversion_table.htm