1

Using Eclipse under windows I'm trying to split a text in two parts, the one from the start until the first line break and the rest of it

String[] result = resumen.split("\\R", 2);
String firstpart = result[0];
String rest = result[1];

Works ok.

But on a Linux machine I'm getting the error:

Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 1
\R

So I read somewhere on SO that I can use:

String[] result = resumen.split("\\\\R", 2);

But this does not work as expect and it doesnt split the sentence.

How can I adapt the code so it can work on a linux machine too?

Thanks in advance.

Avión
  • 7,963
  • 11
  • 64
  • 105

1 Answers1

6

Sounds to me like the Linux machine has an older version of Java, and \R was added after that version.

\R is in Java 8. It's not in Java 7.

If you need to support Java 7, the docs say \R is equivalent to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029].

So based on that:

String[] result = resumen.split("\\u000D\\u000A|[\\u000A\\u000B\\u000C\\u000D\\u0085\\u2028\\u2029]", 2);

but I think we're okay if we supply those characters to the regex engine as literal characters rather than unicode escapes, so:

String[] result = resumen.split("\u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]", 2);

(but test that.)

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • 1
    @XtremeBaumer: Not if the goal is to match line breaks cross-platform. – T.J. Crowder Dec 02 '16 at 08:58
  • @XtremeBaumer: Windows, but it's not so much platform as string contents. If you have a string with Windows-style linebreaks in it, splitting on `\n` will leave the `\r` in the resulting strings. And looking at the `\R` equivalent, there are more exotic line breaks as well. – T.J. Crowder Dec 02 '16 at 09:04
  • how come if i read a textfile with 2 lines and use `System.out.print(line);` i don't get 2 lines as output? – XtremeBaumer Dec 02 '16 at 09:18
  • 1
    @XtremeBaumer: It depends on how you're reading the file; sounds like line breaks were normalized. But again, it's about string contents. `"foo\r\nbar".split("\\n")[0].length()` is 4, not 3. `"foo\r\nbar".split("\\R")[0].length()` is 3 as desired. – T.J. Crowder Dec 02 '16 at 09:20
  • Why not using `System.getProperty("line.separator")` to get the right newline character? – infotoni91 Dec 02 '16 at 09:59