Implementing the useDelimiter method

Question

I have the following code, please keep in mind I'm just starting to learn a language and a such have been looking for fairly simple exercises. Coding etiquette and critics welcome.

import java.util.*;
import java.io.*;

public class Tron
{
    public static void main(String[] args) throws Exception
    {
        int x,z,y = 0;
        File Tron= new File("C:\\Java\\wordtest.txt");
        Scanner word = new Scanner(Tron);
        HashMap<String, Integer> Collection = new HashMap<String, Integer>();
        //noticed that hasNextLine and hasNext both work.....why one over the other?
        while (word.hasNext())
        {
            String s = word.next();
            Collection.get(s);
            if (Collection.containsKey(s))
            {
                Integer n = Collection.get(s);
                n = n+1;
                Collection.put(s,n);
                //why does n++ and n+1 give you different results
            }else
            {
                Collection.put(s,1);
            }       
        }
        System.out.println(Collection);


    }   
}

Without the use of useDelimiter() I get my desired output based on the file I have:

Far = 2, ran = 4, Frog = 2, Far = 7, fast = 1, etc...

Inserting the useDelimiter method as follows

Scanner word = new Scanner(Bible);
word.useDelimiter("\\p{Punct} \\p{Space}");

provides the following output as it appears in the text file shown below.

the the the the the

frog frog

ran

ran ran ran

fast, fast fast

far, far, far far far far far

Why such a difference in output if useDelimiter was supposed to account for punctuation new lines etc? Probably pretty simple but again first shot at a program. Thanks in advance for any advice.

Please post the input (the actual contents of *wordtest.txt*) too. — Péter Török, May 09 '12 at 11:42

score 2 · Answer 1 · edited May 23 '17 at 11:48

With word.useDelimiter("\\p{Punct} \\p{Space}") you are actually telling the scanner to look for delimiters consisting of a punctuation character followed by a space followed by another whitespace character. You probably wanted to have one (and only one) of these instead, which would be achieved by something like

word.useDelimiter("\\p{Punct}|\\p{Space}");

or at least one of these, which would look like

word.useDelimiter("[\\p{Punct}\\p{Space}]+");

Update

@Andrzej nicely answered the questions in your code comments (which I forgot about), however he missed one little detail which I would like to expand / put straight here.

why does n++ and n+1 give you different results

This obviously relates to the line

            n = n+1;

and my hunch is that the alternative you tried was

            n = n++;

which indeed gives confusing results (namely the end result is that n is not incremented).

The reason is that n++ (the postfix increment operator by its canonical name) increments the value of n but the result of the expression is the original value of n! So the correct way to use it is simply

            n++;

the result of which is equivalent to n = n+1.

Here is a thread with code example which hopefully helps you understand better how these operators work.

score 0 · Answer 2 · answered May 09 '12 at 11:58

Péter is right about the regex, you're matching a very specific sequence rather than a class of characters.

I can answer the questions from your source comments:

noticed that hasNextLine and hasNext both work.....why one over the other?

The Scanner class is declared to implement Iterator<String> (so that it can be used in any situation where you want some arbitrary thing that provides Strings). As such, since the Iterator interface declares a hasNext method, the Scanner needs to implement this with the exact same signature. On the other hand, hasNextLine is a method that the Scanner implements on its own volition.

It's not entirely unusual for a class which implements an interface to declare both a "generically-named" interface method and a more domain-specific method, which both do the same thing. (For example, you might want to implement a game-playing client as an Iterator<GameCommand> - in which case you'd have to declare hasNext, but might want to have a method called isGameUnfinished which did exactly the same thing.)

That said, the two methods aren't identical. hasNext returns true if the scanner has another token to return, whereas hasNextLine returns true if the scanner has another line of input to return.

I expect that if you run the scanner over a file which doesn't end in a newline, and consume all but one of the tokens, then hasNext would return true while hasNextLine would return false. (If the file ends in a newline then both methods will behave the same - as there are more tokens if and only if not all lines have been consumed - but they're not technically the same.)

why does n++ and n+1 give you different results

This is quite straightforward.

n + 1 simply returns a value that is one greater than the current value of n. Whereas n++ sets n to be one greater, and then returns that value.

So if n was currently 4, then both options would return 5; the difference is that the value of n would still be 4 if you called n + 1 but it would be 5 if you called n++.

In general, it's wise to avoid using the ++ operator except in situations where it's used as boilerplate (such as in for loops over an index). Taking two or three extra characters, or even an extra line, to express your intent more clearly and unambiguously is such a small price that it's almost always worth doing.

*"n++ sets `n` to be one greater, and then returns that value"* - no, its result is the *original* value of `n`. OTOH `++n` does what you describe. — Péter Török, May 09 '12 at 14:04

Implementing the useDelimiter method

2 Answers2

Update