2

Hi from the java doc here the following:

UNIX_LINES

public static final int UNIX_LINES

Enables Unix lines mode.

In this mode, only the '\n' line terminator is recognized in the behavior of ., ^, and $.

Unix lines mode can also be enabled via the embedded flag expression (?d).

Does anybody have other words to define what it serves? I understood that "\n" escape sequence is recognized only after ., ^, and $. Apparently I am misunderstood.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Rollerball
  • 12,618
  • 23
  • 92
  • 161
  • You may want to refer to http://stackoverflow.com/questions/1279779/what-is-the-difference-between-r-and-n – devnull Apr 17 '13 at 15:48
  • Maybe http://en.wikipedia.org/wiki/Newline as well. – devnull Apr 17 '13 at 15:48
  • @devnull Can I have a quick example? I understand the difference between \n and \r and so forth but when I try something like pattern:"abc$" matchengine:"abc\n" it appears to not be working – Rollerball Apr 17 '13 at 15:58

2 Answers2

3

I will try to explain it on . since same rule apply for ^ and $.

Normally dot . matches every character except new line. In Unix only \n is new line mark, so other characters like carriage return \r are threated as normal characters.

Take a look at this String "A\r\nB\rC\nD". If you will try to find match for regex like.+ using

String data = "A\r\nB\rC\nD";
System.out.println(data);
Matcher m = Pattern.compile(".+").matcher(data);
while (m.find()) {
    System.out.println("["+m.group()+"]");
}

you will get

[A]
[B]
[C]
[D]

but if add flag Pattern.UNIX_LINES characters like \r will also be possible match for . and output will change into

[A
]
[B
C]
[D]

So first match is [A\r], second [B\rC] and third [C]

Pshemo
  • 122,468
  • 25
  • 185
  • 269
2

As far as how they apply specifically to regex behavior; ., ^, and $ depend on the definition of a line feed to function.

  • . matches anything but a line break
  • ^ can match the beginning of a line
  • $ can match the end of a line.

Each of these depend on the correct definition of where a line terminates. The UNIX_LINES setting instructs it to strictly define the line terminator per the standard Unix definition. By default, it defines it more broadly, as seen in the Pattern docs

As far as matching "abc\n", I assume you are using Pattern.matches, or something like it, which must match the entire input? ^ and $ are zero-width. They can match on either side of a newline, but will not consume the newline character. You can consume the \n by simply putting it in your pattern, such as abc\n, or you could also use the $ character somewhat as you indicated, like abc\n$, or if you're feeling frisky (?m)abc$$$$\n$$.

DOTALL and MULTILINE modes might also be of use to you, depending on what you are trying to accomplish.

femtoRgon
  • 32,893
  • 7
  • 60
  • 87