0

I got a list of sentences. I split each sentences and filtered the unwanted words and puncuations. and then store them into

ArrayList<ArrayList<String>> sentence

then I used a hashMap to find the most common word. how could I modify the following hashmap code so I can also find the most common consecutive pairs of words.(N-grams for phrases)

HashMap<String, Integer> hashMap = new HashMap<>();
 
                // Splitting the words of string
                // and storing them in the array.
                
                for(int i =0; i < sentence.size(); i++){
                        ArrayList<String> words = new ArrayList<String>(sentence.get(i));
                        for (String word : words) {
                
                         //Asking whether the HashMap contains the
                         //key or not. Will return null if not.
                        Integer integer = hashMap.get(word);
                
                        if (integer == null)
                                // Storing the word as key and its
                                // occurrence as value in the HashMap.
                                hashMap.put(word, 1);
                
                        else {
                                // Incrementing the value if the word
                                // is already present in the HashMap.
                                hashMap.put(word, integer + 1);
                        }

                        }
                }

i dont know where to start. should i adjust the way i split or do i no split at all in the first place.

noob
  • 1
  • 1

1 Answers1

0

To find the most common consecutive pairs of words (N-grams for phrases), you can modify the above code by looping through the sentence arraylist and creating a new hashmap with the pairs of words as the keys and the number of times they appear as the values. Then, you can iterate through the new hashmap and find the pair of words with the highest value.

    public static String getMostCommonNGram(ArrayList<ArrayList<String>> sentence) {
    HashMap<String, Integer> nGramMap = new HashMap<>();

    // loop through the sentences
    for (ArrayList<String> words : sentence) {
        // loop through the words and create pairs of words
        for (int i = 0; i < words.size() - 1; i++) {
            String nGram = words.get(i) + " " + words.get(i + 1);
            // check if the n-gram already exists in the map
            Integer count = nGramMap.get(nGram);
            // if not, add it to the map with count = 1
            if (count == null) {
                nGramMap.put(nGram, 1);
            } else {
                // if yes, increment the count
                nGramMap.put(nGram, count + 1);
            }
        }
    }

    // find the n-gram with the highest count
    String mostCommonNGram = "";
    int maxCount = 0;
    for (String nGram : nGramMap.keySet()) {
        int count = nGramMap.get(nGram);
        if (count > maxCount) {
            maxCount = count;
            mostCommonNGram = nGram;
        }
    }
    return mostCommonNGram;
}
Referium
  • 11
  • 4