16

In HACKERRANK this line of code occurs very frequently. I think this is to skip whitespaces but what does that "\r\u2028\u2029\u0085" thing mean

 scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
Mayank Bist
  • 181
  • 1
  • 3
  • 10
  • 2
    this is a regular expression, ie. a string expressing the format/characters to match for a given operation: in this case, the scanner is asked to skip carriage return / line feeds, or one of the given unicode character (\u2028 is the unicode reference for the character "Line separator": https://www.fileformat.info/info/unicode/char/2028/index.htm). More info here: https://docs.oracle.com/javase/tutorial/essential/regex/ – spi Aug 31 '18 at 08:07
  • This ignores newlines. U+2028 is such a newline char. – Joop Eggen Aug 31 '18 at 08:22

9 Answers9

22

Scanner.skip skips a input which matches the pattern, here the pattern is :-

(\r\n|[\n\r\u2028\u2029\u0085])?

  • ? matches exactly zero or one of the previous character.
  • | Alternative
  • [] Matches single character present in
  • \r matches a carriage return
  • \n newline

  • \u2028 matches the character with index 2018 base 16(8232 base 10 or 20050 base 8) case sensitive

  • \u2029 matches the character with index 2029 base 16(8233 base 10 or 20051 base 8) case sensitive
  • \u0085 matches the character with index 85 base 16(133 base 10 or 205 base 8) case sensitive

1st Alternative \r\n

  • \r matches a carriage return (ASCII 13)
  • \n matches a line-feed (newline) character (ASCII 10)

2nd Alternative [\n\r\u2028\u2029\u0085]

  • Match a single character present in the list below [\n\r\u2028\u2029\u0085]
  • \n matches a line-feed (newline) character (ASCII 10)
  • \r matches a carriage return (ASCII 13)
  • \u2028 matches the character with index 202816 (823210 or 200508) literally (case sensitive) LINE SEPARATOR
  • \u2029 matches the character with index 202916 (823310 or 200518) literally (case sensitive) PARAGRAPH SEPARATOR
  • \u0085 matches the character with index 8516 (13310 or 2058) literally (case sensitive) NEXT LINE
16

Skip \r\n is for Windows.

The rest is standard \r=CR, \n=LF (see \r\n , \r , \n what is the difference between them?)

Then some Unicode special characters:

u2028 = LINE SEPARATOR (https://www.fileformat.info/info/unicode/char/2028/index.htm)

u2029 = PARAGRAPH SEPARATOR (http://www.fileformat.info/info/unicode/char/2029/index.htm)

u0085 = NEXT LINE (https://www.fileformat.info/info/unicode/char/0085/index.htm)

memo
  • 1,868
  • 11
  • 19
6

OpenJDK's source code shows that nextLine() uses this regex for line separators:

private static final String LINE_SEPARATOR_PATTERN = "\r\n|[\n\r\u2028\u2029\u0085]";
Davide
  • 622
  • 4
  • 10
4

The whole thing is a regex expression, so you could simply drop it into https://regexr.com or https://regex101.com/ and it will provided you with a full description of what each part of the regex means.

Here it is for you though:

(\r\n|[\n\r\u2028\u2029\u0085])? / gm

1st Capturing Group (\r\n|[\n\r\u2028\u2029\u0085])?

? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)

1st Alternative \r\n

\r matches a carriage return (ASCII 13)

\n matches a line-feed (newline) character (ASCII 10)

2nd Alternative [\n\r\u2028\u2029\u0085]

Match a single character present in the list below

[\n\r\u2028\u2029\u0085]

\n matches a line-feed (newline) character (ASCII 10)

\r matches a carriage return (ASCII 13)

\u2028 matches the character 
 with index 202816 (823210 or 200508) literally (case sensitive)

\u2029 matches the character 
 with index 202916 (823310 or 200518) literally (case sensitive)

\u0085 matches the character with index 8516 (13310 or 2058) literally (case sensitive)

Global pattern flags

g modifier: global. All matches (don't return after first match)

m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

As for scanner.skip this does (Scanner Pattern Tutorial):

The java.util.Scanner.skip(Pattern pattern) method skips input that matches the specified pattern, ignoring delimiters. This method will skip input if an anchored match of the specified pattern succeeds.If a match to the specified pattern is not found at the current position, then no input is skipped and a NoSuchElementException is thrown.

I would also recommend reading Alan Moore's answer on here RegEx in Java: how to deal with newline he talks about new ways in Java 1.8.

Community
  • 1
  • 1
Popeye
  • 11,839
  • 9
  • 58
  • 91
  • https://regex101.com/ givea pattern error with this regular expression saying .. \u This token is not supported in the selected flavor – Tarun Nov 05 '20 at 06:25
  • https://regexr.com found this link very useful. Generates nice explanation of the expression. – Tarun Nov 05 '20 at 06:26
1
 scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
  1. in Unix and all Unix-like systems, \n is the code for end-of-line, \r means nothing special
  2. as a consequence, in C and most languages that somehow copy it (even remotely), \n is the standard escape sequence for end of line (translated to/from OS-specific sequences as needed)
  3. in old Mac systems (pre-OS X), \r was the code for end-of-line instead in Windows (and many old OSs), the code for end of line is 2 characters, \r\n, in this order as a (surprising;-) consequence (harking back to OSs much older than Windows), \r\n is the standard line-termination for text formats on the Internet

u0085 NEXT LINE (NEL)

U2029 PARAGRAPH SEPARATOR

U2028 LINE SEPARATOR'

The whole logic behind this is to remove the extra space and extra new line when input is from scanner

SarthAk
  • 1,628
  • 3
  • 19
  • 24
1

There's already a similar question here scanner.skip. It won't skip whitespaces since the unicode char for it is not present (u0020)

\r = CR (Carriage Return) // Used as a new line character in Mac OS before X

\n = LF (Line Feed) // Used as a new line character in Unix/Mac OS X

\r\n = CR + LF // Used as a new line character in Windows

u2028 = line separator

u2029 = paragraph separator

u0085 = next line

Alessandro R
  • 525
  • 7
  • 16
  • there are also similar answers [here](https://stackoverflow.com/a/52111208/1575188) [here](https://stackoverflow.com/a/52111263/1575188) and [here](https://stackoverflow.com/a/52111273/1575188) – fantaghirocco Aug 31 '18 at 08:46
1

This ignores one line break, see \R.

Exactly the same could have been done with \R - sigh.

scanner.skip("\\R?");
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
1

I have a much simpler exercise to explain this

  public class Solution {
    public static void main(String[] args) {
    int i = 4;
    double d = 4.0;
    String s = "HackerRank ";

    Scanner scan = new Scanner(System.in);

    int a;
    double b;
    String c = null;

    a = scan.nextInt();
    b = scan.nextDouble();
    c = scan.nextLine();

    System.out.println(c);
    scan.close();
    System.out.println(a + i);
    System.out.println(b + d);
    System.out.println(s.concat(c));
   }
}

TRY running this.. FIRST and see the output

After that

 public class Solution {

public static void main(String[] args) {
    int i = 4;
    double d = 4.0;
    String s = "HackerRank ";

    Scanner scan = new Scanner(System.in);

    int a;
    double b;
    String c = null;

    a = scan.nextInt();
    b = scan.nextDouble();
    scan.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
    c = scan.nextLine();

    System.out.println(c);
    scan.close();
    System.out.println(a + i);
    System.out.println(b + d);

    System.out.println(s.concat(c));
 }
}

TRY THIS AGAIN..

This can be a very tricky interview question

I cursing myself before I could realise the issue..

Just ask any programmer to take an integer number to take an double number and a string ALL FROM USER INPUT

If they don't know this.. they will most definitely fail..

You can find a much simpler answer about the behaivor of the integer and the double in their javadocs

Johann M
  • 11
  • 2
0

It is associated to scanner class:

Lets suppose u have input from system console

4
This is next line

int a  =scanner.nextInt();
String s = scanner.nextLine();

value of a will be read as 4 and value of s will be empty string because nextLine just reads what is next in same line, and after that it shifts to nextLine


to read it perfectly, u should add one more time nextLine() like below

int a  =scanner.nextInt();
scanner.nextLine();
String s = scanner.nextLine();

to insure that it reaches to nextline and skips everything if there is any anomaly in the input

scan.skip("(\r\n|[\n\r\u2028\u2029\u0085])?"); 

upper line does job perfectly in every OS and environment.

Rohit Kumar
  • 983
  • 9
  • 11