2

I have been reading up on maps and understand some of the differences in tree maps and hash, sorted maps. I was trying to get a map to be sorted when outputting it.

What I needed to be able to do was:

  1. Take a text file and read in the content.
  2. Break it into separate words. Use the words as the key and the value as how many times the key occurs in the txt file.
  3. If the word is at the end of a sentence I am to make it a separate key. E.g., my and my. are two separate keys.

My problem is that no matter if I declare it as a tree, hash or sorted map, I can't get it to output/iterate through in an ordered way. I wanted it to output with the highest occurring value first, but I can't even get it to output with the key in any order.

public static Map<String, Integer> createDictionary(String _filename)
{
    TreeMap<String, Integer> dictionary = new TreeMap<String, Integer>(); // Changed Hash to _______

    try {
          FileReader myFileReader=new FileReader(_filename); // File reader stream open
          BufferedReader myBuffReader=new BufferedReader(myFileReader);

          String str = "\0";

          while (str != null) { // While there are still strings in the file
              str = myBuffReader.readLine(); // We read a line into the str variable

              if (str != null) { // Make sure its not the last line/EOF 
                  // System.out.println(str); // Used for testing. 
                  StringTokenizer myTokenStr=new StringTokenizer(str," \t"); // Create a StringToken obj from the string

                  while (myTokenStr.hasMoreTokens()) {
                      String tokStr = myTokenStr.nextToken(); // Each token is put into an individual string
                      // System.out.println(tokStr);

                      if (dictionary.containsKey(tokStr)) {
                          int value = dictionary.get(tokStr); // Add one to the integer value
                          // dictionary.remove(tokStr); // Was doing this way but just using put method works 
                          // dictionary.put(tokStr, value + 1);
                          dictionary.put(tokStr, value + 1);
                      }
                      else {
                          dictionary.put(tokStr, 1); // Add the string as the key with an int value of one for the value
                      }
                  }
              }
          }

          myBuffReader.close(); // Close stream
          myFileReader.close(); // Close stream
      }
      catch (FileNotFoundException e) {
          System.out.println("File Not Found");
      }
      catch (IOException e) { }

      // System.out.println(dictionary.entrySet());

      return dictionary;
}
GusP
  • 2,454
  • 2
  • 23
  • 32
aaron burns
  • 267
  • 4
  • 15
  • First off, are you sure you want to tokenize on `" \t"`? When you say "break into words", the default token string seems a better match in the general case, i.e. `StringTokenizer(str)` You don't match newlines the way you tokenize. And how exactly do you iterate? (Also, forget HashMap if you want it sorted, TreeMap is the only way to go) – Irfy Feb 14 '12 at 04:40
  • Stiles... it outputs the correct words with the correct word count but nether the words are in alphabetical order or the value/ word count are in order. Its random like a hash map. Irfy.... the tokinize on " \t" is supposed to tokenize on space and tab. The only other would be end of line and that is already taken care of. I might be wrong but it was similar to this with c++ tokinize and it has worked with the test cases i have sent it. – aaron burns Feb 14 '12 at 05:05

3 Answers3

0

This is the documentation for TreeMap, lifted from its Javadoc:


public class TreeMap extends AbstractMap
   implements NavigableMap, Cloneable, Serializable

A Red-Black tree based NavigableMap implementation. The map is sorted according 
to the natural ordering of its keys, or by a Comparator provided at map creation
time, depending on which constructor is used.

In your case, the keys would be strings, and you should expect that iteration will reveal the map to be sorted according to their 'natural order'. Here's an example of the output generated by a TreeMap consisting of String keys and Integer values:

Map<String, Integer> map = new TreeMap<String, Integer>();
map.put("Hello", Integer.valueOf(8));
map.put("Abraham", Integer.valueOf(81));
map.put("Smell", Integer.valueOf(-1));
map.put("Carpet", Integer.valueOf(4));
map.put("Sex", Integer.valueOf(23));

for(String key: map.keySet()) {
    System.out.printf("Map entry %s: %d\n", key, map.get(key));
}

Output:

Map entry Abraham: 81
Map entry Carpet: 4
Map entry Hello: 8
Map entry Sex: 23
Map entry Smell: -1

As you can see, iterating over the map's keys produces as ordered result. This order is defined by the natural order of String. Unfortunately, you cannot implement a SortedMap that sorts on values, which is what I believe you want to do. You can however, sort the entries in the Map outside of it. See more details in this other SO post: TreeMap sort by value.

Community
  • 1
  • 1
Perception
  • 79,279
  • 19
  • 185
  • 195
  • If I change my maps to TreeMaps and I use the same way as posted to iterate through it , it does not give and orderd output like you've show. That is part of my problem. From what i've read to what i am seeing they are not the same. Looking at my code is there something with how I am iterating through the map that could be doing it. I do not understand the use of the iterator completely. – aaron burns Feb 14 '12 at 05:11
  • @aaronburns - your code looks mostly fine. Try calling `trim()` on your strings before storing them in the map. – Perception Feb 14 '12 at 05:14
0

Your map is sorted alphabetically, not by number of occurrences. You need to postprocess the map after the initial parsing. I would suggest:

  1. Parse file into HashMap<String, Integer>
  2. Iterate through HashMap, and add elements to a TreeMap<Integer, Set<String> > (see below).
  3. Output the TreeMap.

You can achieve step 2. by something like:

TreeMap<Integer, Set<String> > treeMap = new TreeMap<Integer, Set<String> > ();
for (Map.Entry<String, Integer> entry: hashMap) {
    Set<String> set = treeMap.get(entry.value());
    if (set == null) {
        set = new TreeSet<String>();
        treeMap.put(entry.value(), set);
    }
    set.add(entry.key());
}

Using TreeSet here sorts the words with same number of occurrences alphabetically, you could use any other Set or List though.

For descending order in step 3.:

for (Map.Entry<Integer, Set<String> > entry: treeMap.descendingMap())
    for (String word: entry.getValue())
        System.out.println(String.format("%d: %s", entry.getKey(), word));

That should do it.

Irfy
  • 9,323
  • 1
  • 45
  • 67
0

Map is a kind of messy abstraction for this sort of thing, but I'm going to throw out Guava's Multiset as a way to address this use case, as it's expressly designed for "counting occurrences of things."

In particular,

return Multisets.copyHighestCountFirst(HashMultiset.copyOf(listOfWords));

returns a Multiset that iterates over elements in order of descending frequency in listOfWords.

There are many questions on SO, by the way, relating to ordering maps by values instead of keys, but I prefer this solution.

Community
  • 1
  • 1
Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413