1

I'm trying to normalize any newlines or escaped newlines in a string to an escaped unix newline. I cannot figure out why this doesn't work:

Pattern EOL = Pattern.compile("(\\\\r)?\\\\n|\r?\n");
final String escapedEOL = "\\\\n";

System.out.println(EOL.matcher("asdf\njkl;").replaceAll(escapedEOL));
System.out.println(EOL.matcher("asdf\n").replaceAll(escapedEOL));
System.out.println(EOL.matcher("asdf\r\njkl;").replaceAll(escapedEOL));
System.out.println(EOL.matcher("asdf\r\n").replaceAll(escapedEOL));
System.out.println(EOL.matcher("asdf\\r\\njkl;").replaceAll(escapedEOL));        
System.out.println(EOL.matcher("asdf\\r\\n").replaceAll(escapedEOL));

Result:

asdf\njkl;
asdf

asdf\njkl;
asdf\n
asdf\njkl;
asdf\n
Done

Can anyone shed any light on this? I realize I could split this into two calls but now I'm curious...

EDIT: Looks like I should have searched harder for similar problems. Looks like quantifiers with groups should be avoided in Java 7.

Pattern EOL = Pattern.compile("\\\\n|\\\\r\\\\n|\r?\n")

Works also.

Community
  • 1
  • 1
another_dev
  • 195
  • 6

2 Answers2

4

I am not sure why but changing order in your regex seems to work as you probably wanted, so change

Pattern EOL = Pattern.compile("(\\\\r)?\\\\n|\r?\n");

to

Pattern EOL = Pattern.compile("\r?\n|(\\\\r)?\\\\n");

Demo

Anyway it looks more like bug than desired behaviour and was changed in Java 8 so there your original regex would also result with

asdf\njkl;
asdf\n
asdf\njkl;
asdf\n
asdf\njkl;
asdf\n
Pshemo
  • 122,468
  • 25
  • 185
  • 269
1

Grouping the left side of the | seems to make things work:

Pattern EOL = Pattern.compile("((\\\\r)?\\\\n)|\r?\n");
ajb
  • 31,309
  • 3
  • 58
  • 84
  • +1, this is also interesting approach since these outer parenthesis shouldn't have any meaning here. – Pshemo Jul 24 '14 at 21:38
  • @Pshemo removing the optional `?` also makes it work... `\\\\r\\\\n|\r?\n` (but ofc. changes the meaning). The interesting thing is, that the usecase where the pattern fails (`"asdf\n"`) should actually be a match of the *right* side of the pattern, but it stops matching when modifying the *left* side of the Alternation. – dognose Jul 24 '14 at 21:44