1

I wanted to implement an algorithm with Dictionary<Dictionary<char,int>, List<string>> to find the anagram words in a dictionary.

As i need to implement my custom EqualityComparer for this Dictionary, is the access time still O(1) i.e big O (1) ?

Second question, As part of the EqualityComparer I also need to implement the GetHashCode(). What is the efficient way of determining GetHashCode() for Dictionary<Dictionary<char,int>, List<string>>?

i just came up with this method, is there any better alternative?

public int GetHashCode(Dictionary<char, int> obj)
    {
        unchecked
        {
            int hashCode = 17;
            foreach (var item in obj)
            {
                hashCode += 23 * item.Key.GetHashCode();
            }
            return hashCode;
        }
    }

Any word of advice is appreciated. Thanks!

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
ioWint
  • 1,609
  • 3
  • 16
  • 34

3 Answers3

2

How about converting the word "need" into the string "d1e2n1" instead of using a Dictionary as key? In order to build this string, you could use a binary tree. A char would be used as key and the character count as value. The binary tree is automatically sorted by the key, which is not the case for a dictionary.

You can calculate a combined hash value from single hash values by combining their binary representation with the XOR operation. With C#, you would do something like this:

public override int GetHashCode()
{
    // Combine hashcode of a and b
    return a.GetHashCode() ^ b.GetHashCode();
}

Finding an entry in an unsorted list is an O(n) operation. Finding an entry in a sorted list is an O(log(n)) operation, if a binary search is used.

Finding a word in a list in a dictionary is an O(1 + n) operation, which is the same as an O(n) operation, or an O(1 + log(n)) operation, which is the same as an O(log(n)) operation.


EDIT:

Here is a possible implementation:

var anagrams = new Dictionary<string, List<string>>();
foreach (string word in words) {
    string key = GetFrequency(word);
    List<string> list;
    if (anagrams.TryGetValue(key, out list)) {
        list.Add(word);
    } else {
        list = new List<string> { word };
        anagrams.Add(key, list);
    }
}

It uses this method to get the key:

private string GetFrequency(string word)
{
    var dict = new SortedDictionary<char, int>(); // Implemented as binary tree
    foreach (char c in word.ToLower()) {
        int count;
        if (dict.TryGetValue(c, out count)) {
            dict[c] += 1;
        } else {
            dict[c] = 1;
        }
    }
    return dict.Aggregate(new StringBuilder(), (sb, item) => sb.Append(item.Key).Append(item.Value), sb => sb.ToString());
}

Using this definition for the words ...

var words = new List<string> { "need", "eden", "team", "meat", "meta", "Nat", "tan" };

This test ...

foreach (var item in anagrams.OrderBy(x => x.Key)) {
    Console.WriteLine();
    Console.WriteLine(item.Key + ":");
    foreach (string word in item.Value.OrderBy(w => w)) {
        Console.WriteLine("    " + word);
    }
}

... produces this output

a1e1m1t1:
    meat
    meta
    team

a1n1t1:
    Nat
    tan

d1e2n1:
    eden
    need

EDIT #2:

Here is the frequency calculation as suggest by Ben Voigt

private string GetFrequencyByBenVoigt(string word)
{
    char[] chars = word.ToLower().ToCharArray();
    Array.Sort(chars);
    return new string(chars);
}

The test result would be

aemt:
    meat
    meta
    team

ant:
    Nat
    tan

deen:
    eden
    need
Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188
  • good idea in general, but how about just "deen" (i.e. sort the letters alphabetically and preserve repeats)? – Ben Voigt Jan 14 '12 at 23:52
  • true its possible, after getting the CharacterHashMap as a Dictionary for the word in analysis, i could make it a string and have it as the key. but it would need the CharacterHashMap to be sorted before i get the string representation. – ioWint Jan 14 '12 at 23:53
  • And can some help me understand better what would happen if two words which are not anagram but have their HashCode equal but Comprarison fails? how will they be stored in the Dictionary? http://stackoverflow.com/a/3809835/253032 – ioWint Jan 14 '12 at 23:55
  • The dictionary handles hash value collisions automatically. Do not worry about it. – Olivier Jacot-Descombes Jan 15 '12 at 00:39
  • You would use a Dictionary> where the key is the constructed frequency string (letters sorted alphabetically + their frequency). The list would contain all the corresponding the anagrams. – Olivier Jacot-Descombes Jan 15 '12 at 00:46
  • Yes, thats true the dictionary handles collision out of the box. Was jus wondering how it does it! – ioWint Jan 15 '12 at 05:14
  • In this [MSDN article](http://msdn.microsoft.com/en-us/library/ms379571(v=vs.80).aspx) Scott Mitchell discusses the collections of the .NET Framework. He mentions three collision avoidance strategies for dictionaries. Microsoft implemented the rehashing strategy. – Olivier Jacot-Descombes Jan 15 '12 at 16:53
2

The access time of a Dictionary<TKey, TValue> approaches O(1) but is not exactly so. In ideal scenarios (good distribution / few collisions) you can think of it as being O(1) though. In situations where there are lots of collisions due to a low variance in the GetHashCode values the access time degrades and can approach O(N).

JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
1

A hash code based on container content will be O(n) in the count of items in the container. You could wrap the dictionary in another type and cache the hash code so it only needs to be computed once... but I can think of several more efficient ways to store that data than a dictionary.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • When is a theoretical O(1) access time achieved? Do you have any suggestions to make it close for this scenario? – ioWint Jan 14 '12 at 23:59
  • When the hash table used internally by the dictionary contains only a few entries compared to its size and the hash code has a good distribution, O(1) will be reached. Microsoft’s dictionary implementation automatically increases the hash table size before the performance decreases too much. – Olivier Jacot-Descombes Jan 16 '12 at 15:08
  • @Oliver: It's linear in the complexity of each element, but not linear in the number of elements in the dictionary. – Ben Voigt Jan 16 '12 at 15:36