0

Hi all i have a question, i have to find how many times words in a file repeat, so i write a code like this

public class CountWords {

FileReader fr;
ArrayList<String> wordsList = new ArrayList<>();

public CountWords(String fname) {
    try {
        fr = new FileReader(fname);
        Scanner s = new Scanner(fr);

        while(s.hasNextLine()){

            String[] parola = s.nextLine().split("[\\p{Blank}]|[\\p{Punct}&&\\p{Blank}]|[\\p{Punct}]");

            for (int i = 0; i < parola.length;){
                if (parola[i] == " "){
                    i ++;
                }else {
                    wordsList.add(parola[i]);
                    i++;
                }
            }
        }

        s.close();

    }catch (FileNotFoundException e) {
        e.printStackTrace();
    }
}

public List<String> getResult() {
    ArrayList<String> res = new ArrayList<String>();
    HashMap<String,Integer> set = new HashMap<>();

    for(String slowa : wordsList) {
        set.put(slowa, set.containsKey(slowa) ? set.get(slowa) + 1 : 1 );
    }


    for(String key : set.keySet()){
        res.add(key + " " + set.get(key));
    }

    return res;
}

}

After the last try where i put a lot of Puncts in between the words i have the problem that the output give me in the array "parola" the blank spaces too how to not include the blank spaces in the array?

  • 3
    `if (parola[i] == " ")` -> [How do I compare strings in Java?](http://stackoverflow.com/questions/513832/how-do-i-compare-strings-in-java) – Pshemo Mar 18 '17 at 11:14
  • oh I forget to write that the main i have already and i can't modify it, – Christian Mar 18 '17 at 11:21
  • Yes i tried the equals method but i obtain the same out put, i tried too with a pattern but nothing, in the array always are included the blank spaces :( – Christian Mar 18 '17 at 11:26
  • in example the text is "ala ma kota ala,... ala kota pies, pies, pies" the output to obtain is: Expected: ala 3 ma 1 kota 2 pies 3 but i obtain: 6 kota 2 pies 3 ma 1 ala 3 – Christian Mar 18 '17 at 11:31
  • Why do you want to read lines and then split them? Scanner allows you to iterate over tokens with `while(s.hasNext()){String token = s.nest(); ... }`. – Pshemo Mar 18 '17 at 11:33
  • because i must have only the words without the interpunction and blanks spaces, so i was thinking that the split method was the best option – Christian Mar 18 '17 at 11:39
  • If you want to ignore interpunction characters then you can include them in scanners delimiters. You can use something like `s.useDelimiter("\\W+")` which means *treat one or more non-alphanumeric characters as delimiters*. You can also add `(?U)` flag to make `\\W` Unicode dependent which would include all characters, not only in `a-z` range. `s.useDelimiter("(?U)\\W+");`. – Pshemo Mar 18 '17 at 11:52
  • Thanks it was helpful :) – Christian Mar 19 '17 at 16:39

0 Answers0