I'm writing a program that reads a text file and counts the number of times each word appears. The program should output words that are used more often than some threshold value given by the user. To avoid boring results, I compare against a provided list of the 100 most commonly used words in the English language.
Adding to the HashMap:
try {
// commonHashMap Filled
Scanner sc = new Scanner(new File("commonwords.txt"));
sc.useDelimiter("[^a-zA-Z']");
String str;
while (sc.hasNext()) {
str = sc.next().toLowerCase(Locale.ENGLISH);
commonHashMap.put(str, 1);
}
sc.close();
// bookHashMap Filled
sc = new Scanner(new File(book));
sc.useDelimiter("[^a-zA-Z']");
// Add the non-common words in the book to HashMap.
while(sc.hasNext()) {
str = sc.next().toLowerCase(Locale.ENGLISH);
if (!commonHashMap.containsKey(str)) {
if (bookHashMap.containsKey(str)) {
bookHashMap.put(str, bookHashMap.get(str)+1); }
else {
bookHashMap.put(str, 1); }
}
}
sc.close();
}
The displaying:
Iterator<Map.Entry<String, Integer>> iterator = bookHashSet.iterator();
while(iterator.hasNext()) {
Map.Entry<String, Integer> x = iterator.next();
if (iterator.hasNext()) {
String key = x.getKey();
int value = x.getValue();
if (value > thresholdValue) {
System.out.println(key + ": " + value);
}
}
}
The output:
1) "The Adventures of Tom Sawyer" by Mark Twain
2) "Tale of Two Cities" by Charles Dickens
3) "The Odyssey" by Homer
Choice Book: 1
Enter Threshold Value: 200
: 27213
don't: 222
tom: 695
huck: 224
me: 212
Where does the "27213" come from?