7

Can anyone help me understand how does this line of code works:

String s = new Scanner(new URL("http://example.com").openStream(), "UTF-8").useDelimiter("\\A").next();

The code is used to directly read from the webpage. How exactly is the scanner object converted to a string and why we use delimiter.

Thanks.

Nuhman
  • 1,172
  • 15
  • 22
  • 2
    It is not "converted". The `next()` Method returns a `String`. Have a look into the JavaDoc of Scanner. [useDelimiter](https://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html#useDelimiter(java.lang.String)) is also described there. – Fildor Mar 29 '16 at 07:46

3 Answers3

15

Here is what happens, with abuse of indentations

     new Scanner(                           // A new scanner is created
             new URL("http://example.com")  // the scanner takes a Stream 
                                            // which is obtained from a URL
          .openStream(),                    // - openStream returns the stream
       "UTF-8")                             // Now the scanner can parse the        
                                            // stream character by character
                                            // with UTF-8 encoding

     .useDelimiter("\\A")                   // Now the scanner set as 
                                            // delimiter the [Regexp for \A][1]
                                            // \A stands for :start of a string!

   .next();                                 // Here it returns the first(next) 
                                            // token that is before another
                                            // start of string. 
                                            // Which, I'm not sure 
                                            // what it will be

From the Java documentation

A simple text scanner which can parse primitive types and strings using regular expressions. A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types using the various next methods.

So you just replaced \A as delimiter (instead of whitespace). BUT \A has a specific meaning when evaluating as regular expression!

If your stream contains only the following text

\Ahello world!\A Goodbye!\A

Your code will return the entire line \Ahello world!\A Goodbye!\A

If you wanted to strip on the sequence of a backslash followed by a upper case A, then you should use \\\\A.

Thanks to @Faux Pas to point out that!

Community
  • 1
  • 1
Kuzeko
  • 1,545
  • 16
  • 39
3

Adding to Kuzeko's answer, \A matches the beginning of the entire text. So, I don't think his 'hello world' example is valid.

Faux Pas
  • 536
  • 1
  • 8
  • 20
2

Scanner is not "converted". On the freshly created instance, useDelimiter is called, which returns a Scanner instance with the delimiter property set accordingly, then on that instance next is called which returns a String.

You may want to lookup Scanner in Java Doc for further reading: https://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html

Fildor
  • 14,510
  • 4
  • 35
  • 67