2

I do understand the concept, where we need 2 backslashes when dealing with regex - https://stackoverflow.com/a/1701876/72437

The following code able to split hello and world without issue.

String message = "hello\nworld";

String[] result = message.split("\\n");

// hello
// world
for (String r : result) {
    System.out.println(r);
}

However, if I use 1 backslash, it works too. (Able to split hello and world too)

String message = "hello\nworld";

String[] result = message.split("\n");

// hello
// world
for (String r : result) {
    System.out.println(r);
}

I expect using only 1 backslash for regex will not work in Java. But, it works. May I know why is it so?

Pshemo
  • 122,468
  • 25
  • 185
  • 269
Cheok Yan Cheng
  • 47,586
  • 132
  • 466
  • 875
  • "*I expect using only 1 backslash for regex will not work in Java*" but why? `\n` represents *single* character (at least in bytecode - after compilation), just like `\r`, `\t` or other ones for instance written using `\uXXXX` form. – Pshemo Mar 22 '18 at 18:13

4 Answers4

4

Your single backslash \n gets interpreted at compiler level to string as LINEFEED character and the LINEFEED character is sent into the String.split() method, and the java-regex engine got the LINEFEED character directly.

And in the case of double backslash \\n , you have the right understanding. \\n in source code gets compiled into \n while going into regex-engine; and \n is escape code for LINEFEED internally in java-regex engine

Alanpatchi
  • 1,177
  • 10
  • 20
2

Java allows you to specify a newline character in 2 ways.

One is by specifying the character literal \n, just as you would split by a comma , or any other character that doesn't need to be regex-escaped.

But Java does allow you to specify a special construct for a newline character.

(In the "Summary of regular-expression constructs" section)

\n The newline (line feed) character ('\u000A')

This is a regular expression construct. This isn't the single character \n, this is a backslash followed by an "n" character, and the backslash would need to be escaped for Java, as you know, as \\.

There is nothing forcing you to use the construct \\n instead of the literal \n.

All this means that you have the option of specifying the character literal \n or using the regular expression construct -- 2 characters -- \\n.

The construct has the advantage of being printable, in case you would ever want to print the pattern you're splitting by.

System.out.println("\\n");  // \n
rgettman
  • 176,041
  • 30
  • 275
  • 357
1

This is an side-effect of how regular expressions are read, why:

message.split("\\n");

This splits the message on the regex \ followed by n, what gets compiled a literal newline because of the \n escape

message.split("\n");

This splits the message on the regex <newline> whats also gets compiled to a literal newline

Ferrybig
  • 18,194
  • 6
  • 57
  • 79
1

"\n" will send as regexp the single character ascii 10.

"\\n" will send as regexp the string of length 2: backslash followed by n.

Both does not means the same but produce the same.

Jean-Baptiste Yunès
  • 34,548
  • 4
  • 48
  • 69