-3

I'm writing a program that reads a text file and counts the number of times each word appears. The program should output words that are used more often than some threshold value given by the user. To avoid boring results, I compare against a provided list of the 100 most commonly used words in the English language.

Adding to the HashMap:

try {
    // commonHashMap Filled
    Scanner sc = new Scanner(new File("commonwords.txt"));
    sc.useDelimiter("[^a-zA-Z']");
    String str;
    while (sc.hasNext()) {
        str = sc.next().toLowerCase(Locale.ENGLISH);
        commonHashMap.put(str, 1);
    }
    sc.close();


    // bookHashMap Filled
    sc = new Scanner(new File(book));
    sc.useDelimiter("[^a-zA-Z']");

    // Add the non-common words in the book to HashMap.
    while(sc.hasNext()) {
        str = sc.next().toLowerCase(Locale.ENGLISH);

        if (!commonHashMap.containsKey(str)) {

            if (bookHashMap.containsKey(str)) {
                bookHashMap.put(str, bookHashMap.get(str)+1); }
            else {
                bookHashMap.put(str, 1); }
        }
    }
    sc.close();
}

The displaying:

Iterator<Map.Entry<String, Integer>> iterator = bookHashSet.iterator();

while(iterator.hasNext()) {

    Map.Entry<String, Integer> x = iterator.next();

    if (iterator.hasNext()) {

        String key = x.getKey();
        int value = x.getValue();

        if (value > thresholdValue) {
            System.out.println(key + ": " + value); 
        }
    }
}

The output:

1) "The Adventures of Tom Sawyer" by Mark Twain
2) "Tale of Two Cities" by Charles Dickens
3) "The Odyssey" by Homer
Choice Book: 1
Enter Threshold Value: 200
: 27213
don't: 222
tom: 695
huck: 224
me: 212

Where does the "27213" come from?

Ivar
  • 6,138
  • 12
  • 49
  • 61
  • Try using one or *multiple* non-alphabetic characters as the delimiter: `[^a-zA-Z']+` – Erwin Bolwidt May 05 '18 at 23:37
  • [What does your step debugger tell you?](http://stackoverflow.com/questions/25385173/what-is-a-debugger-and-how-can-it-help-me-diagnose-problems). Your question can be answered very quickly and easily with your step-debugger. You should always try and solve your problems with a step debugger before coming to StackOverflow. –  May 05 '18 at 23:43

1 Answers1

0

I've just tried your code and I see it is counting the empty spaces. Just use this to make sure it will just count the words:

if (str.length() != 0)

This will see if the string length is 0 which means it doesn't consist of any word. You can also use trim() for better results.

devkarim
  • 81
  • 9