Amount of times a String appears in a Java hash table

Question

I am trying to input thousands of Strings from a text file, and then be able to rank the most popular Strings. I'm lost on how to keep track of how many of each String there are.

Do I need to implement another ADT, like linkedlist? I am not allowed to use the java libraries except for ArrayList.

Here is what I have so far.

public class StudentTrends implements Trends {
   int entries = 0;
   //ArrayList<Integer> list;
   String[] table;
   int arraySize;

public StudentTrends() {
    //this.list = new ArrayList<Integer>();
    this.table = new String[10];
    Arrays.fill(table, "-1");
}

//Method I'm having trouble with
@Override
public void increaseCount(String s, int amount) {
    int key = horner(s);

    if(table[key] == null){
        entries++;
        //table[key] = table[key];
    }
    else{
        amount += 1+amount;
    }
}


/**
 * The hashing method used
 * @param key
 *          Is the String inputed
 * @param size
 *          Size of the overall arraylist
 * @return
 *          The int representation of that string
 */
private int horner(String key){
    int val = 0;

    for(int a = 0; a < key.length(); a++){
        val = ((val << 8) | (a)) % table.length;
    }
    table[val] = key;
    return val;
}

And here is the interface I need to implement. Not vital for the post, but it can be used to better understand what I am trying to do.

public interface Trends {   

/**
 * Increase the count of string s.
 * 
 * @param s          String whose count is being increased.
 * @param amount     Amount by which it is being increased.
 */
public void increaseCount(String s, int amount);

/**
 * Return the number of times string s has been seen.
 * @param s     The string we are counting.
 * @return int  The number of times s has been seen thus far.
 */
public int getCount(String s);


/**
 * Get the nth most popular item based on its count.  (0 = most popular, 1 = 2nd most popular).
 * In case of a tie, return the string that comes first alphabetically.
 * @param n         Rank requested
 * @return string   nth most popular string.
 */
public String getNthMostPopular(int n);

/**
 * Return the total number of UNIQUE strings in the list. This will NOT be equal to the number of
 * times increaseCount has been called, because sometimes you will add the same string to the
 * data structure more than once. This function is useful when looping through the results
 * using getNthPopular. If you do getNthPopular(numEntries()-1), it should get the least popular item. 
 * @return  Number of distinct entries.
 */
public int numEntries();

};

In terms of efficiency, I guess a sortable hashmap is more ideal. You can look into TreeMap, which is a structures that implements SortedMap. http://stackoverflow.com/questions/7427758/how-to-use-sortedmap-interface-in-java — Heron Yang, May 06 '17 at 01:03
You probably have to implement something like a hash table, with String as the key and count as the number of times said string is mentioned. I don't think a list will help here, exactly. If you need only one string that is most popular, you can track that with a single reference. However if you need all strings, you'll need to pull all entries out of your table, sort them based on count, and then use that as the "most popular" list. — markspace, May 06 '17 at 01:03
I don't think a TreeMap will work. You have to increase the counts after they are in the tree, which will mess up the order on the tree. That means you have to remove each entry, increase its count, then reinsert it in the tree. One fast sort after all entries are counted sounds more efficient to me, though I haven't tested that. — markspace, May 06 '17 at 01:06
Based on the use cases, if you have lots of "getNthMostPopular" requests and new updates in the same time. It's totally not efficient to run fast sort per request. O(n log n) v.s. O(constant) per request is a big difference. — Heron Yang, May 06 '17 at 01:09

Jacob G. · Answer 1 · 2017-05-06T01:10:46.883

1

If the only Java ADT you're allowed to use is an ArrayList, I suggest you use one and call Collections#sort on it with a custom Comparator, and then Collections#frequency to find the frequency of the most common element.

Assuming list is already initialized with each String:

Collections.sort(list, Comparator.comparing(s -> Collections.frequency(list, s)).reversed());

// Frequency of most common element
System.out.println(Collections.frequency(list, list.get(0)));

Seeing as you're only allowed to use an ArrayList, this method will most likely be too advanced for you. There are ways that you can do it with nested for-loops, but it would be very messy.

edited May 06 '17 at 01:10

answered May 06 '17 at 01:09

Jacob G.

28,856
5
62
116

Do I you the list in conjunction with the array? Or just use the list? – Harrison Bergman May 06 '17 at 01:10
I would just stick with a `List`, as you might not know how many lines are in the file. – Jacob G. May 06 '17 at 01:11
That makes sense. Only issue I'm having is then why is hashing really necessary for what I'm trying to do? – Harrison Bergman May 06 '17 at 01:13
It isn't, I wasn't sure if it was a specification or not so I didn't bother commenting on it – Jacob G. May 06 '17 at 01:15
Alright, no worries. This should make my life much easier. Thanks a lot – Harrison Bergman May 06 '17 at 01:18

score 1 · Answer 2 · answered May 06 '17 at 01:13

1

You don't have to write a hash table for this. You could just have something like this:

class Entry {
    String key;
    int count;
}

List<Entry> entries;

And then when you want to find an entry, just loop over the list:

for (Entry e : entries)  {
    if (e.key.equals(searchKey)) {
        // found it
    }
}

A hash table is a lot better in terms of time complexity, but it's frankly a really daunting task for somebody who's new to data structures. If the hash table is really a necessary part of the assignment, then disregard this, but I just wanted to point out that it's not otherwise strictly necessary.

answered May 06 '17 at 01:13

Radiodef

37,180
14
90
125

This wouldn't work very well if I was asked for the 18th most popular String in the data set. A big part of the assignment is efficiency though, so I may have to go with a hash table. Although, I can't use the Java library – Harrison Bergman May 06 '17 at 01:16
It's perfectly fine. You can sort the list with a comparator, for example. – Radiodef May 06 '17 at 01:16
How would I do the sorting with a comparator in this context? – Harrison Bergman May 06 '17 at 01:58
1

`Collections.sort(entries, Comparator.comparingInt(e -> e.count));`. You will end up having to do something like this anyway. Hash tables are certainly a great thing to learn how write, if you really want to. You need to handle collisions (what if two keys have the same hash?), which is usually done by having an array of linked lists. The hash is an index in to the array, and then you walk the linked list to see if the key already existed. If it didn't exist, then you append a new entry on the tail of the list. You also need to handle resizing the table, maybe. – Radiodef May 06 '17 at 02:33

Amount of times a String appears in a Java hash table

2 Answers2