4

does any one have any idea how to sort a list of words in the order of their frequency (least to greatest) using the built in collection.sort and a comparator<string> interface?

I already have a method that gets the count of a certain word in the text file. Now, I just need to create a method that compares the counts of each word and then puts them in a list sorted by the least frequency to the greatest.

Any ideas and tips would be very much appreciated. I'm having trouble getting started on this particular method.

public class Parser implements Comparator<String> {

    public Map<String, Integer> wordCount;

    void parse(String filename) throws IOException {
        File file = new File(filename);
        Scanner scanner = new Scanner(file);

        //mapping of string -> integer (word -> frequency)
        Map<String, Integer> wordCount = new HashMap<String, Integer>();

        //iterates through each word in the text file
        while(scanner.hasNext()) {
            String word = scanner.next();
            if (scanner.next()==null) {
                wordCount.put(word, 1);
            }
            else {
                wordCount.put(word, wordCount.get(word) + 1);;
                }
            }
            scanner.next().replaceAll("[^A-Za-z0-9]"," ");
            scanner.next().toLowerCase();
        }

    public int getCount(String word) {
        return wordCount.get(word);
    }

    public int compare(String w1, String w2) {
        return getCount(w1) - getCount(w2);
    } 

        //this method should return a list of words in order of frequency from least to   greatest
    public List<String> getWordsInOrderOfFrequency() {
        List<Integer> wordsByCount = new ArrayList<Integer>(wordCount.values());
        //this part is unfinished.. the part i'm having trouble sorting the word frequencies
        List<String> result = new ArrayList<String>();


    }
}
Robotnik
  • 3,643
  • 3
  • 31
  • 49
user1333781
  • 107
  • 2
  • 4
  • 10
  • 4
    Sure, create a class that holds a String (for the word), and an int (for the count), and make it implement `Comparable` and then in the compareTo(...) method, compare based on the int values. – Hovercraft Full Of Eels Apr 15 '12 at 01:12
  • Thanks for the advice. I actually mapped the list of words from a string to an integer to keep the count of each frequency. I'm not sure how to sort the words by frequency using the map property and collections.sort method – user1333781 Apr 15 '12 at 01:18
  • Write some code and post it. We'll go over it. – Tony Ennis Apr 15 '12 at 02:18
  • hey, i added my code. I'm not looking for code to copy, but suggestions to improve and finish the frequency method. Thanks! – user1333781 Apr 15 '12 at 02:49
  • Use `Collections.sort(justTheWords, this)` – Torious Apr 15 '12 at 03:16
  • Where you have your comment `//this part is unfinished`, just do: `List justWords = new ArrayList(wordCount.keySet()); List result = Collections.sort(justWords, this);`... – Torious Apr 15 '12 at 14:00
  • ^^Torious, that didn't seem to work when I tried it. Thanks for the help though – user1333781 Apr 15 '12 at 22:46
  • 1
    The idea was that `sort` will call `this.compare()` during sorting, to compare 2 `String`s, which are then compared by first looking up the count in `this` (the `Parser` instance). That's assuming `parse` was called first. I figured that was the intent of the original code, looking at the `compare` method. Did I miss something? – Torious Apr 15 '12 at 23:22
  • @Torious: Oh, I'm sorry. Now I understand how it works (and deleted my comments). I didn't realize how the Comparator would work on the Map, while being declared on , not . Now it is so obvious! :) – user unknown Apr 16 '12 at 03:46

4 Answers4

7

First of all your usage of scanner.next() seems incorrect. next() will return the next word and move onto next one every time you call it, therefore the following code:

if(scanner.next() == null){ ... }

and also

scanner.next().replaceAll("[^A-Za-z0-9]"," ");
scanner.next().toLowerCase();

will consume and then just throw away words. What you probably want to do is:

String word = scanner.next().replaceAll("[^A-Za-z0-9]"," ").toLowerCase();

at the beginning of your while loop, so that the changes to your word are saved in the word variable, and not just thrown away.

Secondly, the usage of the wordCount map is slightly broken. What you want to do is to check if the word is already in the map to decide what word count to set. To do this, instead of checking for scanner.next() == null you should look in the map, for example:

if(!wordCount.containsKey(word)){
  //no count registered for the word yet
  wordCount.put(word, 1);
}else{
  wordCount.put(word, wordCount.get(word) + 1);
}

alternatively you can do this:

Integer count = wordCount.get(word);
if(count == null){
  //no count registered for the word yet
  wordCount.put(word, 1);
}else{
  wordCount.put(word, count+1);
}

I would prefer this approach, because it's a bit cleaner, and does only one map look-up per word, whereas the first approach sometimes does two look-ups.

Now, to get a list of words in descending order of frequencies, you can convert your map to a list first, then apply Collections.sort() as was suggested in this post. Below is a simplified version suited to your needs:

static List<String> getWordInDescendingFreqOrder(Map<String, Integer> wordCount) {

    // Convert map to list of <String,Integer> entries
    List<Map.Entry<String, Integer>> list = 
        new ArrayList<Map.Entry<String, Integer>>(wordCount.entrySet());

    // Sort list by integer values
    Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() {
        public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
            // compare o2 to o1, instead of o1 to o2, to get descending freq. order
            return (o2.getValue()).compareTo(o1.getValue());
        }
    });

    // Populate the result into a list
    List<String> result = new ArrayList<String>();
    for (Map.Entry<String, Integer> entry : list) {
        result.add(entry.getKey());
    }
    return result;
}

Hope this helps.

Edit: Changed the comparison function as suggested by @dragon66. Thanks.

Community
  • 1
  • 1
rodion
  • 14,729
  • 3
  • 53
  • 55
  • 1
    Why not return (o2.getValue()).compareTo(o1.getValue()) instead of *-1 – dragon66 Apr 15 '12 at 04:55
  • thanks. it helps a lot. but for some reason now "Parser" in "Public class Parser implements Comparator" is underlined in red telling me that it needs to inherit the abstract method? I'm confused why it's doing this. – user1333781 Apr 15 '12 at 22:43
  • Is the `int compare(String w1, String w2)` method still there? If you are using Jdk6 you may also have to prefix your method with `@Override` annotation, otherwise it will complain. Also, you can always put the `getWordInDescendingFreqOrder()` method into another class and reference it from your class. – rodion Apr 16 '12 at 10:03
1

You can compare and extract ideas from the following:

public class FrequencyCount {

    public static void main(String[] args) {

        // read in the words as an array
        String s = StdIn.readAll();
        // s = s.toLowerCase();
        // s = s.replaceAll("[\",!.:;?()']", "");
        String[] words = s.split("\\s+");

        // sort the words
        Merge.sort(words);

        // tabulate frequencies of each word
        Counter[] zipf = new Counter[words.length];
        int M = 0;                                        // number of distinct words
        for (int i = 0; i < words.length; i++) {
            if (i == 0 || !words[i].equals(words[i-1]))   // short-circuiting OR
                zipf[M++] = new Counter(words[i], words.length);
            zipf[M-1].increment();
        }

        // sort by frequency and print
        Merge.sort(zipf, 0, M);                           // sorting a subarray
        for (int j = M-1; j >= 0; j--) {
            StdOut.println(zipf[j]);
        }
    }
}
UVM
  • 9,776
  • 6
  • 41
  • 66
1

A solution, close to your original posting with corrections and the sorting as suggested by Torious in the comments:

import java.util.*;

public class Parser implements Comparator <String> {

    public Map<String, Integer> wordCount;

    void parse ()
    {
        Scanner scanner = new Scanner (System.in);

        // don't redeclare it here - your attribute wordCount will else be shadowed
        wordCount = new HashMap<String, Integer> ();

        //iterates through each word in the text file
        while (scanner.hasNext ()) {
            String word = scanner.next ();
            // operate on the word, not on next and next of next word from Scanner
            word = word.replaceAll (" [^A-Za-z0-9]", " ");
            word = word.toLowerCase ();
            // look into your map:
            if (! wordCount.containsKey (word))
                wordCount.put (word, 1);
            else
                wordCount.put (word, wordCount.get (word) + 1);;
        }
    }

    public int getCount (String word) {
        return wordCount.get (word);
    }

    public int compare (String w1, String w2) {
        return getCount (w1) - getCount (w2);
    }

    public List<String> getWordsInOrderOfFrequency () {
        List<String> justWords = new ArrayList<String> (wordCount.keySet());
        Collections.sort (justWords, this);
        return justWords; 
    }

    public static void main (String args []) {
        Parser p = new Parser ();
        p.parse ();
        List<String> ls = p.getWordsInOrderOfFrequency ();
        for (String s: ls) 
            System.out.println (s);
    }
}
user unknown
  • 35,537
  • 11
  • 75
  • 121
0

rodions Solution is a kind of a Generics hell, but I don't have it simpler - just different.

In the End, his solution is shorter and better.

At the first looks, it seems that a TreeMap might be appropriate, but it sorts by Key, and is of no help for sorting by value, and we can't switch key-value, because we look it up by the key.

So the next idea is to generate a HashMap, and use Collections.sort, but it doesn't take a Map, just Lists for sorting. From a Map, there is entrySet, which produces another Collection, which is a Set, and not a List. That was the point where I took another direction:

I implemented an Iterator: I iterate over the entrySet, and only return Keys, where the value is 1. If the value is 2, I buffer them for later use. If the Iterator is exhausted, I look into the buffer, and if it isn't empty, I use the iterator of the buffer in future, increment the minimum value I look for, and create a new Buffer.

The advantage of an Iterator/Iterable pair is, that the values can be obtained by the simplified for-loop.

import java.util.*;

// a short little declaration :) 
public class WordFreq implements Iterator <Map.Entry <String, Integer>>, Iterable <Map.Entry <String, Integer>>
{
    private Map <String, Integer> counter;
    private Iterator <Map.Entry <String, Integer>> it;
    private Set <Map.Entry <String, Integer>> buf;
    private int maxCount = 1; 

    public Iterator <Map.Entry <String, Integer>> iterator () {
        return this;
    }

    // The iterator interface expects a "remove ()" - nobody knows why
    public void remove ()
    {
        if (hasNext ())
            next ();
    } 

    public boolean hasNext ()
    {
        return it.hasNext () || ! buf.isEmpty ();
    }

    public Map.Entry <String, Integer> next ()
    {
        while (it.hasNext ()) {
            Map.Entry <String, Integer> mesi = it.next ();
            if (mesi.getValue () == maxCount)
                return mesi;
            else
                buf.add (mesi);
        }
        if (buf.isEmpty ())
            return null;
        ++maxCount;
        it = buf.iterator (); 
        buf = new HashSet <Map.Entry <String, Integer>> ();     
        return next ();
    } 

    public WordFreq ()
    {
        it = fill ();
        buf = new HashSet <Map.Entry <String, Integer>> ();
        // The "this" here has to be an Iterable to make the foreach work
        for (Map.Entry <String, Integer> mesi : this)
        {
            System.out.println (mesi.getValue () + ":\t" + mesi.getKey ());
        }
    }

    public Iterator <Map.Entry <String, Integer>> fill ()
    {
        counter = new HashMap <String, Integer> ();
        Scanner sc = new Scanner (System.in);
        while (sc.hasNext ())
        {
            push (sc.next ());
        }
        Set <Map.Entry <String, Integer>> set = counter.entrySet ();
        return set.iterator ();
    }

    public void push (String word)
    {
        Integer i = counter.get (word);
        int n = 1 + ((i != null) ? i : 0); 
        counter.put (word, n);
    }

    public static void main (String args[])
    {
        new WordFreq ();
    }
}

Since my solution reads from stdin, you invoke it with:

cat WordFreq.java | java WordFreq
user unknown
  • 35,537
  • 11
  • 75
  • 121