Java Program to count similar words in a paragraph

Question

I am trying to count the number of occurrences of similar words in a paragraph in Java read from a file, but for some reasons the count is not working. Can you please tell me how to modify the method so that it can work.

void countsmwrd(String str) {
    int count = 0;
    String temp = "";
    ArrayList<String> vx = new ArrayList<String>();
    System.out.println("\nThe tokens are: ");
    StringTokenizer s = new StringTokenizer(str, " ,.", true);
    for (int i = 0; s.hasMoreTokens(); i++) {
        vx.add(s.nextToken());
    }

    for (int i = 0; i < vx.size(); i++) {
        String c = vx.get(i);

        for (int j = i; j < vx.size(); j++) {
            String k = vx.get(j);
            if (c == k && temp.indexOf(c) == -1) {
                count = count + 1;
            }

        }
        if (temp.indexOf(c) == -1) {
            temp = temp + c;
            System.out.println("Character   " + c + "   occurs   " + count + "    times");
        }

        count = 0;
    }
}

possible duplicate of [How do I compare strings in Java?](http://stackoverflow.com/questions/513832/how-do-i-compare-strings-in-java) — Alexis C., May 04 '14 at 14:15

Jeff Ward · Answer 1 · 2014-05-04T16:53:08.883

You can leverage a Set to determine the word count. I also suggest normalizing your input string by calling str.toLowerCase() so that "The" and "the" would only count for one word. I would also pass false for the returnDelims parameter in the StringTokenizer since the delimiters shouldn't constitute words. Here is an example:

public int wordCount(String str) {
    StringTokenizer s = new StringTokenizer(str.toLowerCase(), " ,.", false);
    Set<String> uniqueWords = new HashSet<String>();
    while (s.hasMoreTokens()) {
        uniqueWords.add(s.nextToken());
    }
    return uniqueWords.size();
}

score 0 · Answer 2 · answered May 04 '14 at 14:26

0

Your tokenizer is splitting based on " ,.". You won't have many of those in your typical paragraph. Change it to split on a space only. Even better a regular expression for white space ("\\s*" for zero or more whitespaces).

answered May 04 '14 at 14:26

Matt Stevens

1,104
1
12
24

1

"\\s" for white space. Java needs the extra backslash to tell the tokenizer that it means "\s". – Gilbert Le Blanc May 04 '14 at 14:29

Java Program to count similar words in a paragraph

2 Answers2