7

Say I want to store a dictionary of strings and I want to know if some string exists or not. I can use a Trie or a HashMap. The HashMap has a time complexity of O(1) with a high probability while the Trie in that case would have a time complexity of O(k) where k is the length of the string.

Now my question is: Doesn't calculating the hash value of the string have a time complexity of O(k) thus making the complexity of the HashMap the same? If not, why?

The way I see it is that a Trie here would have lower time complexity than a HashMap for looking up a string since the HashMap -in addition to calculating the hash value- might hit collisions. Am I missing something?

Update: Which data structure would you use to optimize for speed when constructing a dictionary?

Ahmed Fathy
  • 529
  • 4
  • 20
  • Possible duplicate: http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1. Also, it's not unreasonable to assume that an item's hash code is computed when the object is constructed, or computed once on first use and then cached. In either case, computing the hash code is then O(1), or *amortized* O(1). โ€“ Jim Mischel Jul 15 '16 at 03:55

2 Answers2

4

Apart from the complexity of implementation of a trie, certain optimizations are done in the implementation of the hashCode method that determines the buckets in a hash table. For java.lang.String, an immutable class, here is what JDK-8 does:

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
} 

Thus, it is cached (and is thread-safe). Once calculated, the hash code of a string need not be recalculated. This saves you from having to spend the O(k) time in the case of hash table (or hash set, hash map).

While implementing dictionaries, I think tries shine where you are more interested in possible partial matches rather than exact matches. Generally speaking hash based solutions work best in case of exact matches.

Kedar Mhaswade
  • 4,535
  • 2
  • 25
  • 34
  • 2
    Even if you cache the hash code, you still need to compare the string against other strings it collides with, which can still drive the time complexity up. โ€“ templatetypedef Jul 15 '16 at 04:18
  • Also while we're on the topic of tries vs. hashmaps, lets not forget notable memory usage differences between the two. With a hashmap, if you wanted to store `hat` and `hats`, you'd end up with something like `map['hat']` and `map['hats']`, whereas with a trie you'd just have `{h} -> {a} -> {t} -> {s}`. So in general, hashmaps are more speed efficient but tries are more memory efficient โ€“ Nick Zuber Jul 15 '16 at 16:37
0

The time complexity of performing operations on a hash table is typically measured in the number of hashes and compares that have to be performed. On expectation, the cost, when measured this way, is O(1) because on expectation only a constant number of hashed and compares must be used.

To determine the cost of using a hash table for strings, you do indeed need to factor in the cost of these operations, which will be O(k) each for a string of length k. Therefore, the cost of a hash table operation on a string is O(1) ยท O(k) = O(k), matching the trie cost, though only on expectation and with a different constant factor.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065