-1

I'm quite new to coding and have run across a problem I cant seem to fix.

I want to write some code that counts words in a sentence, simple enough:

    public static void countNoOfWords(String string) throws Exception
    { 
        int countWord = 0;
        boolean word = false;
        int endOfLine = string.length() - 1;
        string.replaceAll("\\p{Punct}+", "a"); 
        for (int i = 0; i < string.length(); i++) 

{
                if (Character.isLetter(string.charAt(i)) && i != endOfLine){    
                word = true;
            } 
                else if (!Character.isLetter(string.charAt(i)) && word){     
                countWord++;
                word = false;

            } 
                else if (Character.isLetter(string.charAt(i)) && i == endOfLine){
                countWord++;
            }
    }
           System.out.println("\nTotal number of words: " + countWord);
           selectAnalysis(string);
    }

But I want it to ignore words made entirely of special characters such as £,$,& etc. so if I insert a sentence such as "Hello ^^&% mi$$" the answer will still be 3, where as I want it to be 2.

Ive tried a number of different solutions including turning all the words into strings and placing them in an array, replacing special characters with letters, counting spaces.Ive looked around for posts on here and seen some similar questions but not the answer to mine, I'm really quite stuck!

Thank for your help

P.S.Sorry if this isn't posted correctly or if its missing something, or if the question was answered before I've read a few of these posts but this is my first one.

  • 1
    I'd recommend regular expressions, but then you'd have two problems. – duffymo Apr 10 '16 at 13:13
  • On my computer, the output is 2. – dryairship Apr 10 '16 at 13:14
  • This "replacing special characters with letters" was a good idea, except you want to replace all not-a-letter with a space. What remains is 'regular words'. But, Q.: is `w@rld` to be considered 1 word, no word (because it contains an invalid character), or 2 words - 1 before and 1 after an invalid character? – Jongware Apr 10 '16 at 13:16
  • 2
    `string.replaceAll("\\p{Punct}+", "a");` strings are immutable (http://stackoverflow.com/questions/12734721/string-not-replacing-characters) – Pshemo Apr 10 '16 at 13:16
  • You could possibly find solution over here : http://stackoverflow.com/questions/22124429/how-to-count-words-do-not-count-series-of-special-char-using-scanner-loop-arr – Pankaj Verma Apr 10 '16 at 13:18
  • Rad Lexus: Hi Thanks for your comment, I didnt want to rpace all the special characters with a space, as then they would count as two separate words EG. "W@rld" becomes "W" and "rld". "Q.: is w@rld to be considered 1 word, no word (because it contains an invalid character), or 2 words - 1 before and 1 after an invalid character? " I was trying to get it so that say "W@rld" would count but "@@@" would not count, if that makes sense – Sean Mcfadzean Apr 10 '16 at 13:26
  • All characters are special to me—especially, m–dash. So, there are two test cases for you, "me—especially," is two words; and "m–dash" is one word. Do you want to exclude "m–dash" because of the n–dash? Natural language is very tricky. It should not be used for coding exercises.See [Word Breaks](http://unicode.org/reports/tr29/#Word_Boundaries). – Tom Blodget Apr 10 '16 at 16:46

2 Answers2

2

I would start by using String.split(String) on white space, creating an array of words. Then I would iterate the array with an enhanced for-each loop, testing if the word doesn't match all punctuation (with your provided regex \\p{Punct}+) and increment a counter if it doesn't. Something like1,

public static void countNoOfWords(String string) {
    int countWord = 0;
    String[] words = string.split("\\s+");
    for (String word : words) {
        if (!word.matches("\\p{Punct}+")) {
            countWord++;
        }
    }
    System.out.println("Total number of words: " + countWord);
    // selectAnalysis(string);
}

1I removed the Exception and commented out the non-posted selectAnalysis so I could run it, it produces your expected output with your provided input.

Elliott Frisch
  • 198,278
  • 20
  • 158
  • 249
0

The answer that tells you to split the string into an array of words is the way I would go, but out of curiosity I did a google search and found out the following:

The pattern for p{Punct} is a patten for "POSIX character classes (US-ASCII only)" as written in the Java docs: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

I'm not sure if this causes the problem, maybe your strings are written in UTF or maybe it doesn't matter but It's worth looking into.

Anyway, when I tried out your program with your example "Hello ^^&% mi$$", it did return the correct answer 2. But yeah, a much more elegant solution is to split the string into array of words.

  • You are quite correct p{Punct} only includes !"#$%&'()*+,-./:;<=>?@[\]^_{|}~` which would be fine but as I'm living in England **£** is a pretty notable exception! Thanks for pointing this out! – Sean Mcfadzean Apr 10 '16 at 14:32