3

i want to write a program that displays for example 5 most repeated word in a text. the words of the text saved in a map that its keys are words and its values are number of repeating that word.

this program displays the most repeated word but i don't know how to improve that to display 5 most repeated word(and how to use map instead of list).

import java.io.BufferedReader;    
import java.io.FileReader;    
import java.util.ArrayList;    

public class MostRepeatedWord {    

    public static void main(String[] args) throws Exception {    
        String line, word = "";    
        int count = 0, maxCount = 0;    
        ArrayList<String> words = new ArrayList<String>();    

        //Opens file in read mode    
        FileReader file = new FileReader("data.txt ");    
        BufferedReader br = new BufferedReader(file);    

        //Reads each line    
        while((line = br.readLine()) != null) {    
            String string[] = line.toLowerCase().split("([,.\\s]+) ");    
            //Adding all words generated in previous step into words    
            for(String s : string){    
                words.add(s);    
            }    
        }    

        //Determine the most repeated word in a file    
        for(int i = 0; i < words.size(); i++){    
            count = 1;    
            //Count each word in the file and store it in variable count    
            for(int j = i+1; j < words.size(); j++){    
                if(words.get(i).equals(words.get(j))){    
                    count++;    
                }     
            }    
            //If maxCount is less than count then store value of count in maxCount     
            //and corresponding word to variable word    
            if(count > maxCount){    
                maxCount = count;    
                word = words.get(i);    
            }    
        }    

        System.out.println("Most repeated word: " + word);    
        br.close();    
    }    
}   
Levi007
  • 315
  • 1
  • 3
  • 13
  • 2
    Here is a question on how to count the items in a list, https://stackoverflow.com/questions/505928/how-to-count-the-number-of-occurrences-of-an-element-in-a-list – Joakim Danielson Nov 12 '19 at 14:22

4 Answers4

7

This is one of those cases where a functional style can lead code that's a lot shorter — and hopefully more understandable!

Once you have the words list, you can simply use:

words.groupingBy{ it }
     .eachCount()
     .toList()
     .sortedByDescending{ it.second }
     .take(5)

The groupingBy() creates a Grouping from the list of words.  (Normally, you'd give a key selector function, explaining what to group the items on, but in this case we want the words themselves, hence the it.)  Since we only care about the number of occurrences, eachCount() gets the counts.  (Thanks to Ilya and Tenfour04 for that part.)

Then we convert the map into a list, ready to be sorted.  The list consists of pairs, with the word as the first value, and the count as the second.

So the sortedByDescending{ it.second } sorts by the count.  And because we're sorting in descending order, it gives the most frequently-used words first.

Finally, take(5) takes the first five values from the list, which will be the five most common words (along with their counts).

For example, when I ran this on a few simple sentences, it gave: [(the, 4), (was, 3), (it, 3), (a, 2), (of, 2)].

(If you only want the words, not the counts, you could then use .map{it.first}.  Also, as Tenfour04 suggests, there are better ways of extracting words from text; but that can get quite complicated once you start considering case, apostrophes, hyphens, non-ASCII letters, etc. — and seems like a separate question from getting the most common words.)

gidds
  • 16,558
  • 2
  • 19
  • 26
  • Good, but `groupingBy { it }.eachCount()` instead of `groupBy { it }.mapValues { it.value.size }` will make it even better. – Ilya Nov 13 '19 at 14:38
  • @Ilya Updated, thanks!  (I wasn't going to steal Tenfour04's idea, but since you told me to… :-) – gidds Nov 13 '19 at 14:55
2

Since you tagged Kotlin, here's an implementation:

fun File.mostCommonWords(count: Int) =
    mutableListOf<String>().also { words ->
        forEachLine { line ->
            words += Regex("[a-zA-Z-]+").findAll(line).map { it.value.toLowerCase() }
        }
    }
        .groupingBy { it }
        .eachCount() // <-- Here's your map of words to word counts
        .toList()
        .sortedByDescending { it.second }
        .take(count)
        .toMap() // Converts it from a list of pairs back to a map

I think it's less error-prone to find your words instead of splitting by spaces and punctuation. Your sample code misses a lot of types of punctuation. Using groupingBy/eachCount gives you a map of words to their counts, which you can sort to get the most common ones. To sort a map by values you can convert it to a list of pairs first.

Tenfour04
  • 83,111
  • 11
  • 94
  • 154
0

Try it as following:

        Map<String, Integer> words = new HashMap<>();

        try (BufferedReader br = new BufferedReader(new FileReader("data.txt "))) {
            String line;
            while ((line = br.readLine()) != null) {
                String string[] = line.toLowerCase().split("([,.\\s]+) ");
                //Adding all words generated in previous step into words
                for (String s : string) {
                    words.compute(s, (word, count) -> {
                        if (count == null)
                            return 1;
                        return count + 1;
                    });
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

        final int TOP = 5;
        final List<Integer> topCounts = words.values().stream()
                .sorted(Comparator.reverseOrder())
                .limit(TOP)
                .collect(Collectors.toList());

        final String[] topWords = new String[TOP];
        words.forEach((word, count) -> {
            int indexOfWord = topCounts.indexOf(count);
            if (indexOfWord > -1) {
                topWords[indexOfWord] = word;
                topCounts.set(indexOfWord, -1);
            }
        });

        System.out.println("Top 5 most repeated words: " + Arrays.toString(topWords));
Vasif
  • 668
  • 4
  • 10
0

The simple way is to first count the frequency of each word (count how many times it appears), and then sort the words by their frequency, and then show the top-5 of the result. You don't actually need to sort all of them, but it is simpler than just selecting the top-5.

Let us write some code:

 class WordCounter {
     private Map<String, Integer> counts = new HashMap<>();

     public void count(String word) {
        int prev = counts.getOrDefault(word, 0);
        counts.put(word, prev+1);
     }

     public String[] top(int n) {
         String[] sorted = new String[counts.size()];
         int i=0;
         for (String s : counts.keySet()) sorted[i++] = s;
         Arrays.sort(sorted, (a,b) -> counts.get(b).compareTo(counts.get(a))); 
         return Arrays.copyOfRange(sorted, 0, n);
     }
 }

Now, we call count(word) after reading each word. That takes care of frequencies. And call top(5) after finishing, which sorts results by count (decreasing) and returns the top 5.

tucuxi
  • 17,561
  • 2
  • 43
  • 74
  • This code will not compile, arrays doesn't have method like `keySet` – Vasif Nov 12 '19 at 15:50
  • 1
    what java version do you use? coutnts[b] doesn't work. – Levi007 Nov 12 '19 at 15:52
  • fixed bad syntax. Should have been `get(b)` and `get(a)`. Got confused with C++ while programming directly into the answer. – tucuxi Nov 12 '19 at 16:17
  • Explained 17 hours ago – Vasif Nov 13 '19 at 09:23
  • @Vasif if you downvote based on program errors, you should reverse the downvote once they are solved. I solved mine 18 hours ago (proof: https://ideone.com/2mO88L). Yours still does not show "top 5·. – tucuxi Nov 13 '19 at 10:47