I am trying to code to read info from a text file, I need to find out how many times each word separated by white spaces occur. My code works and it counts all the words accurately, but im stuck with this one problem...I need to remove commas and other special characters, because it will make separate entries for the same word. for example language and language, will show as 2 different words and it will make 2 separate word counters for both language and language,.
package wordcounter;
public class WordCounter {
public static void main(String[] args) throws FileNotFoundException, IOException {
Map map = new HashMap();
try (BufferedReader br = new BufferedReader(new FileReader("languages.txt"))) {
StringBuilder sb = new StringBuilder();
StringBuffer content = new StringBuffer();
String line = br.readLine();
while (line != null) {
String[] words = line.split(" ");
for (String word : words){
if(word == null || word.trim().equals("")){
continue;
}
}
for (int i = 0; i < words.length; i++) {
if (map.get(words[i]) == null) {
map.put(words[i], 1);
} else {
int newValue = Integer.valueOf(String.valueOf(map.get(words[i])));
newValue++;
map.put(words[i], newValue);
}
}
sb.append(System.lineSeparator());
line = br.readLine();
}
}
Map<String, Integer> sorted = new TreeMap<String, Integer>(map);
for (Object key : sorted.keySet()) {
System.out.println(key + "\tCounts: " + map.get(key));
}
}
}
I know I need to use replace/replaceAll, i just cant figure out where or how. Any help would be appreciated.
input: there are languages, within other languages.
output: languages. Counts: 5; languages, Counts: 1