Imo, using streams is not very efficient for this as it is difficult to extract and apply useful information that may or may not change from within the stream (unless you write your own collector
).
This method uses Java 8+ map enhancements such as merge
and computeIfAbsent
. This also computes the frequency of words including ties with one iteration. It does this by using two maps.
individualFrequencies
- A map of each word's number of occurrences, keyed by the word.
equalFrequencies
- A map that contains those words that have the same frequencies, keyed by the frequency.
- the Map.merge method is used to compute the frequency of each word encountered in a
Map<String, Integer>
- the other map is used to tally all the words that have that frequency. It is declared as
Map<Integer, List<String>>
.
- if the count returned by
merge
is greater than or equal to the maxCount
, then that word will be added to the list obtained from the equalMaxFrequencies map
for that count. If the count doesn't exist for that count, a new list is created and the word is added to that. Map.computeIfAbsent facilitates this process. Note that this map may have lots of outdated garbage as new entries are added. The final entry that one wants is the entry retrieved by the maxCount
key.
String sentence = "Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.";
int maxCount = 0;
Map<String, Integer> individualfrequencies = new HashMap<>();
Map<Integer, List<String>> equalFrequencies = new HashMap<>();
for (String word : sentence.toLowerCase().split("[!;:,.\\s]+")) {
int count = individualfrequencies.merge(word, 1, Integer::sum);
if (count >= maxCount) {
maxCount = count;
equalFrequencies
.computeIfAbsent(count, v -> new ArrayList<>())
.add(word);
}
}
for (String word : equalFrequencies.get(maxCount)) {
System.out.printf("%s --> %d%n", word, maxCount);
}
prints
ram --> 3
is --> 3
It's interesting to note that not all words will appear in the equalFrequencies
map. This behavior is dictated by the order in which the words are processed. As soon as one word is repeated, any others that follow won't appear unless they either tie or exceed the current maxCount.