2

I'm storing my wordcount into the value field of a HashMap, how can I then get the 500 top words in the text?

 public ArrayList<String> topWords (int numberOfWordsToFind, ArrayList<String> theText) {

        //ArrayList<String> frequentWords = new ArrayList<String>();

        ArrayList<String> topWordsArray= new ArrayList<String>();

        HashMap<String,Integer> frequentWords = new HashMap<String,Integer>();

        int wordCounter=0;

        for (int i=0; i<theText.size();i++){



                  if(frequentWords.containsKey(theText.get(i))){

                       //find value and increment
                      wordCounter=frequentWords.get(theText.get(i));
                      wordCounter++;
                      frequentWords.put(theText.get(i),wordCounter);

                  }

                else {
                  //new word
                  frequentWords.put(theText.get(i),1);

                }
        }


        for (int i=0; i<theText.size();i++){

            if (frequentWords.containsKey(theText.get(i))){
                 // what to write here?
                frequentWords.get(theText.get(i));

            }
        }
        return topWordsArray;
    }
andandandand
  • 21,946
  • 60
  • 170
  • 271

3 Answers3

4

One other approach you may wish to look at is to think of this another way: is a Map really the right conceptual object here? It may be good to think of this as being a good use of a much-neglected-in-Java data structure, the bag. A bag is like a set, but allows an item to be in the set multiple times. This simplifies the 'adding a found word' very much.

Google's guava-libraries provides a Bag structure, though there it's called a Multiset. Using a Multiset, you could just call .add() once for each word, even if it's already in there. Even easier, though, you could throw your loop away:

Multiset<String> words = HashMultiset.create(theText);

Now you have a Multiset, what do you do? Well, you can call entrySet(), which gives you a collection of Multimap.Entry objects. You can then stick them in a List (they come in a Set), and sort them using a Comparator. Full code might look like (using a few other fancy Guava features to show them off):

Multiset<String> words = HashMultiset.create(theWords);

List<Multiset.Entry<String>> wordCounts = Lists.newArrayList(words.entrySet());
Collections.sort(wordCounts, new Comparator<Multiset.Entry<String>>() {
    public int compare(Multiset.Entry<String> left, Multiset.Entry<String> right) {
        // Note reversal of 'right' and 'left' to get descending order
        return right.getCount().compareTo(left.getCount());
    }
});
// wordCounts now contains all the words, sorted by count descending

// Take the first 50 entries (alternative: use a loop; this is simple because
// it copes easily with < 50 elements)
Iterable<Multiset.Entry<String>> first50 = Iterables.limit(wordCounts, 50);

// Guava-ey alternative: use a Function and Iterables.transform, but in this case
// the 'manual' way is probably simpler:
for (Multiset.Entry<String> entry : first50) {
    wordArray.add(entry.getElement());
}

and you're done!

Cowan
  • 37,227
  • 11
  • 66
  • 65
  • You can do exactly the same with `HashMap`, just sort `Map.Entry` elements in a list. No need to use Guava at all. – yegor256 Apr 22 '12 at 07:01
1

Here you can find a guide how to sort a HashMap by the values. After the sorting you can just iterate over the first 500 entries.

  • he wants to sort by the values, not the keys. – MeBigFatGuy Apr 13 '11 at 17:21
  • 1) I don't think so if I check his code again. 2) The same link also provides an explanation how to sort by values. –  Apr 13 '11 at 17:22
  • that example seems flawed to me. If there are two values with the same value, the first is overwritten. – MeBigFatGuy Apr 13 '11 at 17:23
  • Yes I thought again about it, and it is true, he wants to sort by values. However the link should provide a solution. –  Apr 13 '11 at 17:24
  • It ain't working. The line TreeSet sortedSet = new TreeSet(yourMapValues); creates a Set with just 9 elements in my test cases. – andandandand Apr 13 '11 at 19:46
  • yes as i said it is a bogus example... Look here: http://stackoverflow.com/questions/109383/how-to-sort-a-mapkey-value-on-the-values-in-java – MeBigFatGuy Apr 14 '11 at 00:43
-1

Take a look at the TreeBidiMap provided by the Apache Commons Collections package. http://commons.apache.org/collections/api-release/org/apache/commons/collections/bidimap/TreeBidiMap.html

It allows you to sort the map according to both the key or the value set.

Hope it helps.

Zhongxian

ausgoo
  • 247
  • 1
  • 10
  • TreeBidiMap (or Gauva's equivalent, BiMap) aren't helpful in this case; they enforce that the keys, as well as the values, are unique. In this case more than one word could have the same count. – Cowan Apr 14 '11 at 00:53
  • You are right, I have tested, when inverse the key and values, it delete entries with same key. Sorry. – ausgoo Apr 15 '11 at 00:25