-5

I have a sample string in below input format. I'm trying to fetch the most repeated word along with it's occurance count as shown in the expected output format. How can we achieve this by using java8 streams api?

Input:

"Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms."

Expected Output:

Ram -->3
is -->3
dev007
  • 49
  • 2
  • 10

2 Answers2

1
    String text = "Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.";
    List<String> wordsList = Arrays.asList(text.split("[^a-zA-Z0-9]+"));
    Map<String, Long> wordFrequency = wordsList.stream().map(word -> word.toLowerCase())
            .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

    long maxCount = Collections.max(wordFrequency.values());

    Map<String, Long> maxFrequencyList = wordFrequency.entrySet().stream().filter(e -> e.getValue() == maxCount)
            .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

    System.out.println(maxFrequencyList);
Abhishek
  • 156
  • 1
  • 10
  • This will get all the word's frequency. But I need max occurred words as shown in the expected output – dev007 May 24 '23 at 11:03
  • Thanks a lot.. It is working as expected – dev007 May 24 '23 at 11:28
  • This won't work. You removed the exclamation point from the OP's sentence after RAM. – WJS May 24 '23 at 13:52
  • 1
    just need to remove special char. updated answer. – Abhishek May 24 '23 at 14:49
  • 1
    There’s no sense in performing a (regex based) `replaceAll` before performing the (also regex based) `split` operation. Just use the pattern directly with `split`, i.e. `split("[^a-zA-Z0-9]+")`. Further, when you’re unconditionally calling `get()` on the optional anyway, you can also use `long maxCount = Collections.max(wordFrequency.values());` It has the same behavior of throwing a `NoSuchElementException` when the map is empty. – Holger May 25 '23 at 07:22
  • 1
    Note further that you can use `Arrays.stream(text.split("[^a-zA-Z0-9]+"))` to create a `Stream` in the first place, without a `List` detour. See also [this answer](https://stackoverflow.com/a/40933002/2711488) – Holger May 25 '23 at 07:31
1

Imo, using streams is not very efficient for this as it is difficult to extract and apply useful information that may or may not change from within the stream (unless you write your own collector).

This method uses Java 8+ map enhancements such as merge and computeIfAbsent. This also computes the frequency of words including ties with one iteration. It does this by using two maps.

  • individualFrequencies - A map of each word's number of occurrences, keyed by the word.
  • equalFrequencies - A map that contains those words that have the same frequencies, keyed by the frequency.
  • the Map.merge method is used to compute the frequency of each word encountered in a Map<String, Integer>
  • the other map is used to tally all the words that have that frequency. It is declared as Map<Integer, List<String>>.
  • if the count returned by merge is greater than or equal to the maxCount, then that word will be added to the list obtained from the equalMaxFrequencies map for that count. If the count doesn't exist for that count, a new list is created and the word is added to that. Map.computeIfAbsent facilitates this process. Note that this map may have lots of outdated garbage as new entries are added. The final entry that one wants is the entry retrieved by the maxCount key.
String sentence = "Ram is employee of ABC company, ram is from Blore, RAM! is good in algorithms.";

int maxCount = 0;
Map<String, Integer> individualfrequencies = new HashMap<>();
Map<Integer, List<String>> equalFrequencies = new HashMap<>();

for (String word : sentence.toLowerCase().split("[!;:,.\\s]+")) {
    int count = individualfrequencies.merge(word, 1, Integer::sum);
    if (count >= maxCount) {
        maxCount = count;
        equalFrequencies
                .computeIfAbsent(count, v -> new ArrayList<>())
                .add(word);
    }
}

for (String word : equalFrequencies.get(maxCount)) {
    System.out.printf("%s --> %d%n", word, maxCount);
}

prints

ram --> 3
is --> 3

It's interesting to note that not all words will appear in the equalFrequencies map. This behavior is dictated by the order in which the words are processed. As soon as one word is repeated, any others that follow won't appear unless they either tie or exceed the current maxCount.

WJS
  • 36,363
  • 4
  • 24
  • 39