-3

I am trying to count the number of occurrences of similar words in a paragraph in Java read from a file, but for some reasons the count is not working. Can you please tell me how to modify the method so that it can work.

void countsmwrd(String str) {
    int count = 0;
    String temp = "";
    ArrayList<String> vx = new ArrayList<String>();
    System.out.println("\nThe tokens are: ");
    StringTokenizer s = new StringTokenizer(str, " ,.", true);
    for (int i = 0; s.hasMoreTokens(); i++) {
        vx.add(s.nextToken());
    }

    for (int i = 0; i < vx.size(); i++) {
        String c = vx.get(i);

        for (int j = i; j < vx.size(); j++) {
            String k = vx.get(j);
            if (c == k && temp.indexOf(c) == -1) {
                count = count + 1;
            }

        }
        if (temp.indexOf(c) == -1) {
            temp = temp + c;
            System.out.println("Character   " + c + "   occurs   " + count + "    times");
        }

        count = 0;
    }
}
user1803551
  • 12,965
  • 5
  • 47
  • 74
Dev
  • 11
  • 2

2 Answers2

1

You can leverage a Set to determine the word count. I also suggest normalizing your input string by calling str.toLowerCase() so that "The" and "the" would only count for one word. I would also pass false for the returnDelims parameter in the StringTokenizer since the delimiters shouldn't constitute words. Here is an example:

public int wordCount(String str) {
    StringTokenizer s = new StringTokenizer(str.toLowerCase(), " ,.", false);
    Set<String> uniqueWords = new HashSet<String>();
    while (s.hasMoreTokens()) {
        uniqueWords.add(s.nextToken());
    }
    return uniqueWords.size();
}
Jeff Ward
  • 1,109
  • 6
  • 17
0

Your tokenizer is splitting based on " ,.". You won't have many of those in your typical paragraph. Change it to split on a space only. Even better a regular expression for white space ("\\s*" for zero or more whitespaces).

Matt Stevens
  • 1,104
  • 1
  • 12
  • 24