1

I need a data structure like Dictionary<string,T> where I could do both case sensitive and insensitive searchs.

I am looking to improve the O(n) time that I can get with a List<Tuple<string,T>> by iterating with foreach with a case sensitive or insensitive StringComparer.

This is for a library where I want the end user to select case sensitivity on the Search method call. (otherwise I could create a different Dictionary with sensitivity on/off in the class constructor)

Any ideas?

Gerardo Grignoli
  • 14,058
  • 7
  • 57
  • 68
  • You could create 2 dictionaries. One that stores the keys all uppercase (or lowercase) and one that stores the keys in case sensitive form. – itsme86 Oct 21 '19 at 23:23
  • Right, but how do I quantify how many entries should this double structure have in order to the benefit in performance be bigger than doubling memory cost and the increased (2x) Add() cost. – Gerardo Grignoli Oct 21 '19 at 23:25
  • I don't understand - why can you not pass the type of comparer you want to use? Are you saying that the case-sensitive dictionary has different keys with similar characters, like `"First"` and `"first"`? Or is it just for searching? – Rufus L Oct 21 '19 at 23:28
  • @RufusL I first receive the list of values with unknown case, and later the user does case sensitive or insensitive searches. Dictionary only allows to define sensitivity when you create the Dictionary. Once created, it's either sensitive or insensitive. – Gerardo Grignoli Oct 21 '19 at 23:31
  • 1
    `var comparer = StringComparer.OrdinalIgnoreCase;` `var caseInsensitiveDictionary = new Dictionary(comparer);` basically define the comparer to use – Anu6is Oct 21 '19 at 23:32
  • nvm...just saw your other comment – Anu6is Oct 21 '19 at 23:32
  • 3
    `Right, but how do I quantify how many entries should this double structure have in order to the benefit in performance be bigger than doubling memory cost and the increased (2x) Add() cost.` https://ericlippert.com/2012/12/17/performance-rant/ – mjwills Oct 21 '19 at 23:35
  • 3
    I would suggest creating one case insensitive Dictionary, doing all searches on it initially, and when needing a case sensitive search, filter the results down by doing a case sensitive comparison. – NetMage Oct 21 '19 at 23:36
  • Honestly, if the search is by user input I am not convinced a `Dictionary` is the right data structure. Most people will expect a string search to match on a substring, for example, which a `Dictionary` does not handle well. Is there a particular reason you aren't storing this data in a database and using SQL to query it? – mjwills Oct 21 '19 at 23:37
  • You will to get around making your own dictionary. The first Type is a Key, not a searchstring. So the rules about it are pretty fixed. The big issue is that this will cost you performance. As I just recently learned, Dictioanry uses the Hashtable mechanics to speed up Key comparision, and your approach would just sidestep that half the time. – Christopher Oct 21 '19 at 23:37
  • Maybe this would work: `var value = dict.FirstOrDefault(kvp => kvp.Key.Equals("search", StringComparison.OrdinalIgnoreCase)).Value;` Not going to be super fast, though. – Rufus L Oct 21 '19 at 23:39
  • @RufusL & NetMage Your proposals (use dictionary for sensitivity O(n*log ), and foreach on the entries O(n)) at least improves the one of the two scenarios without new costs. Thanks – Gerardo Grignoli Oct 21 '19 at 23:43
  • 1
    Can you say more about how many entries in this dictionary will be collisions under case insensitivity? That is, are you expecting to have `Japan` and `japan` in the dictionary, like, two or three collisions, or are you expecting to have `bananarama`, `bananaRama`, `BanaNaRaMa`, ... with dozens or hundreds or thousands of collisions? It makes a difference what algorithm you should choose. – Eric Lippert Oct 21 '19 at 23:56
  • @EricLippert as this code is a library, it would depend on the end user. Typically it would be less than 100 items with no collisions (different cases should be resolved with the case insensitive search, but improper use could be in the 100k with collisions) – Gerardo Grignoli Oct 22 '19 at 00:05
  • 1
    @mjwills Unfortunately, that won't work as you must keep the original case-sensitive key, so you need something like `Dictionary>` or I chose to use `Dictionary>` wrapping a case-insensitive outer dictionary around multiple case-sensitive inner dictionaries. – NetMage Oct 22 '19 at 00:19
  • 2
    @GerardoGrignoli: Why do you care if "improper" use gives bad results? If users are abusing your tool then they have chosen the wrong tool for the job. If you care about this scenario then you have a pretty difficult problem to solve and you should not be using an off-the-shelf dictionary. You should be researching the abusive cases that you care about and looking to build a special-purpose dictionary designed to have good behaviour in the face of abuse. – Eric Lippert Oct 22 '19 at 00:41
  • After thinking more and reading comments, I think a proper class should have an abstraction opposite the implementation - it should present as a case-sensitive dictionary even if implemented as a case-insensitive dictionary wrapped around case-sensitive dictionaries. I will update my answer. – NetMage Oct 22 '19 at 17:38
  • I re-wrote my answer based around my new thoughts. – NetMage Oct 22 '19 at 22:57
  • Does this answer your question? [Case insensitive access for generic dictionary](https://stackoverflow.com/questions/13230414/case-insensitive-access-for-generic-dictionary) – Michael Freidgeim Sep 22 '22 at 02:07
  • No @MichaelFreidgeim. I knew that before asking the question. – Gerardo Grignoli Oct 06 '22 at 00:44
  • Sorry for the automatically created unintentional question - the comment should be read as [“Possible duplicate”](https://meta.stackexchange.com/questions/339563/the-auto-comment-does-this-answer-your-question-generated-when-voting-to-clos) – Michael Freidgeim Oct 06 '22 at 05:43
  • @MichaelFreidgeim That question asks for any solution and provides a O(n) one, this one is specifically rejecting O(n) and asking for a O(1) solution. If you close this one, David L, wouldn't have posted an O(1) solution, hours ago. What's the joy in censorship? Do you get paid to shutdown conversations? – Gerardo Grignoli Oct 06 '22 at 08:30
  • @GerardoGrignoli, it is not a censorship, “possible duplicate” is a way of cleanup and join multiple similar questions in one “canonical”. Thanks, your comment and David L answer clarified the differences. – Michael Freidgeim Oct 06 '22 at 21:43

6 Answers6

2

You could just use an ordinary dictionary but define an extension method for performing a case-insensitive search:

static class ExtensionMethods
{
    static public T GetValue<T>(this Dictionary<string,T> source, string key, bool caseSensitive)
    {
        if (caseSensitive) return source[key];
        key = source.Keys.FirstOrDefault( k => String.Compare(key, k, StringComparison.CurrentCultureIgnoreCase) == 0);
        if (key == null) throw new KeyNotFoundException();
        return source[key];
    }
}

Or, if you really want, you could subclass the dictionary and make the above a proper instance member.

John Wu
  • 50,556
  • 8
  • 44
  • 80
2

After further thought, and reading the comments, I think the best implementation is to have extend what appears to be a case-sensitive Dictionary with new case-insensitive properties and methods. Since the implementation is based on a case-insensitive Dictionary holding case-sensitive sub-dictionaries, and C# doesn't have private inheritance, it seems best to just implement a new Dictionary wrapper.

public class CaseDictionary<TValue> : IDictionary<string, TValue>, IDictionary, IReadOnlyDictionary<string, TValue> {
    #region Members
    Dictionary<string, Dictionary<string, TValue>> CIDict;
    #endregion

    #region Constructors
    public CaseDictionary() {
        CIDict = new Dictionary<string, Dictionary<string, TValue>>(StringComparer.OrdinalIgnoreCase);
    }

    public CaseDictionary(int init) {
        CIDict = new Dictionary<string, Dictionary<string, TValue>>(init, StringComparer.OrdinalIgnoreCase);
    }

    public CaseDictionary(IDictionary<string, TValue> init)
        : this(init != null ? init.Count : 0) {
        foreach (var kvp in init)
            Add(kvp.Key, kvp.Value);
    }
    #endregion

    #region Properties
    public ICollection<string> Keys => CIDict.Values.SelectMany(v => v.Keys).ToList();
    public ICollection<TValue> Values => CIDict.Values.SelectMany(v => v.Values).ToList();
    public int Count => CIDict.Values.Select(v => v.Count).Sum();

    public TValue this[string aKey]
    {
        get
        {
            if (CIDict.TryGetValue(aKey, out var possibles) && possibles.TryGetValue(aKey, out var theValue))
                return theValue;
            throw new KeyNotFoundException();
        }
        set
        {
            if (CIDict.TryGetValue(aKey, out var possibles)) {
                if (possibles.ContainsKey(aKey))
                    possibles[aKey] = value;
                else
                    possibles.Add(aKey, value);
            }
            else
                CIDict.Add(aKey, new Dictionary<string, TValue>() { { aKey, value } });
        }
    }
    #endregion

    #region Methods
    public void Add(string aKey, TValue aValue) {
        if (CIDict.TryGetValue(aKey, out var values))
            values.Add(aKey, aValue);
        else
            CIDict.Add(aKey, new Dictionary<string, TValue>() { { aKey, aValue } });
    }

    public bool ContainsKey(string aKey) {
        if (CIDict.TryGetValue(aKey, out var possibles))
            return possibles.ContainsKey(aKey);
        else
            return false;
    }

    public bool Remove(string aKey) {
        if (CIDict.TryGetValue(aKey, out var possibles))
            return possibles.Remove(aKey);
        else
            return false;
    }

    public bool TryGetValue(string aKey, out TValue theValue) {
        if (CIDict.TryGetValue(aKey, out var possibles))
            return possibles.TryGetValue(aKey, out theValue);
        else {
            theValue = default(TValue);
            return false;
        }
    }
    #endregion

    #region ICollection<KeyValuePair<,>> Properties and Methods
    bool ICollection<KeyValuePair<string, TValue>>.IsReadOnly => false;

    void ICollection<KeyValuePair<string, TValue>>.Add(KeyValuePair<string, TValue> item) => Add(item.Key, item.Value);
    public void Clear() => CIDict.Clear();

    bool ICollection<KeyValuePair<string, TValue>>.Contains(KeyValuePair<string, TValue> item) {
        if (CIDict.TryGetValue(item.Key, out var possibles))
            return ((ICollection<KeyValuePair<string, TValue>>)possibles).Contains(item);
        else
            return false;
    }

    bool ICollection<KeyValuePair<string, TValue>>.Remove(KeyValuePair<string, TValue> item) {
        if (CIDict.TryGetValue(item.Key, out var possibles))
            return ((ICollection<KeyValuePair<string, TValue>>)possibles).Remove(item);
        else
            return false;
    }

    public void CopyTo(KeyValuePair<string, TValue>[] array, int index) {
        if (array == null)
            throw new ArgumentNullException("array");
        if (index < 0 || index > array.Length)
            throw new ArgumentException("index must be non-negative and within array argument Length");
        if (array.Length - index < Count)
            throw new ArgumentException("array argument plus index offset is too small");

        foreach (var subd in CIDict.Values)
            foreach (var kvp in subd)
                array[index++] = kvp;
    }
    #endregion

    #region IDictionary Methods
    bool IDictionary.IsFixedSize => false;
    bool IDictionary.IsReadOnly => false;
    ICollection IDictionary.Keys => (ICollection)Keys;
    ICollection IDictionary.Values => (ICollection)Values;

    object IDictionary.this[object key]
    {
        get
        {
            if (key == null)
                throw new ArgumentNullException("key");
            if (key is string aKey)
                if (CIDict.TryGetValue(aKey, out var possibles))
                    if (possibles.TryGetValue(aKey, out var theValue))
                        return theValue;

            return null;
        }
        set
        {
            if (key == null)
                throw new ArgumentNullException("key");
            if (value == null && default(TValue) != null)
                throw new ArgumentNullException("value");
            if (key is string aKey) {
                if (value is TValue aValue)
                    this[aKey] = aValue;
                else
                    throw new ArgumentException("value argument has wrong type");
            }
            else
                throw new ArgumentException("key argument has wrong type");
        }
    }

    void IDictionary.Add(object key, object value) {
        if (key == null)
            throw new ArgumentNullException("key");
        if (value == null && default(TValue) != null)
            throw new ArgumentNullException("value");
        if (key is string aKey) {
            if (value is TValue aValue)
                Add(aKey, aValue);
            else
                throw new ArgumentException("value argument has wrong type");
        }
        else
            throw new ArgumentException("key argument has wrong type");
    }

    bool IDictionary.Contains(object key) {
        if (key == null)
            throw new ArgumentNullException("key");

        if (key is string aKey)
            if (CIDict.TryGetValue(aKey, out var possibles))
                return possibles.ContainsKey(aKey);

        return false;
    }

    void IDictionary.Remove(object key) {
        if (key == null)
            throw new ArgumentNullException("key");

        if (key is string aKey)
            Remove(aKey);
    }
    #endregion

    #region ICollection Methods
    bool ICollection.IsSynchronized => false;
    object ICollection.SyncRoot => throw new NotImplementedException();

    void ICollection.CopyTo(Array array, int index) {
        if (array == null)
            throw new ArgumentNullException("array");
        if (array.Rank != 1)
            throw new ArgumentException("array argument can not be multi-dimensional");
        if (array.GetLowerBound(0) != 0)
            throw new ArgumentException("array argument has non-zero lower bound");

        if (array is KeyValuePair<string, TValue>[] kvps) {
            CopyTo(kvps, index);
        }
        else {
            if (index < 0 || index > array.Length)
                throw new ArgumentException("index must be non-negative and within array argument Length");
            if (array.Length - index < Count)
                throw new ArgumentException("array argument plus index offset is too small");
            if (array is DictionaryEntry[] des) {
                foreach (var subd in CIDict.Values)
                    foreach (var kvp in subd)
                        des[index++] = new DictionaryEntry(kvp.Key, kvp.Value);
            }
            else if (array is object[] objects) {   
                foreach (var subd in CIDict.Values)
                    foreach (var kvp in subd)
                        objects[index++] = kvp;
            }
            else
                throw new ArgumentException("array argument is an invalid type");
        }
    }
    #endregion

    #region IReadOnlyDictionary<,> Methods
    IEnumerable<string> IReadOnlyDictionary<string, TValue>.Keys => CIDict.Values.SelectMany(v => v.Keys);
    IEnumerable<TValue> IReadOnlyDictionary<string, TValue>.Values => CIDict.Values.SelectMany(v => v.Values);
    #endregion

    #region Case-Insensitive Properties and Methods
    public ICollection<string> KeysCI => CIDict.Keys;
    public IndexerPropertyAtCI AtCI => new IndexerPropertyAtCI(this);

    public bool ContainsKeyCI(string aKey) => CIDict.ContainsKey(aKey);
    public bool TryGetValueCI(string aKey, out ICollection<TValue> rtnValues) {
        if (CIDict.TryGetValue(aKey, out var theValues)) {
            rtnValues = theValues.Select(v => v.Value).ToList();
            return true;
        }
        else {
            rtnValues = default(List<TValue>);
            return false;
        }
    }

    public class IndexerPropertyAtCI {
        CaseDictionary<TValue> myDict;

        public IndexerPropertyAtCI(CaseDictionary<TValue> d) => myDict = d;

        public ICollection<TValue> this[string aKey] => myDict.CIDict[aKey].Select(v => v.Value).ToList();
    }
    #endregion

    #region IEnumerable Methods
    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();

    public IEnumerator<KeyValuePair<string, TValue>> GetEnumerator() {
        foreach (var subd in CIDict.Values)
            foreach (var kvp in subd)
                yield return kvp;
    }

    IDictionaryEnumerator IDictionary.GetEnumerator() => new CaseDictionaryEnumerator(GetEnumerator());

    struct CaseDictionaryEnumerator : IDictionaryEnumerator {
        private IEnumerator<KeyValuePair<string, TValue>> en;

        public CaseDictionaryEnumerator(IEnumerator<KeyValuePair<string, TValue>> anEn) => en = anEn;

        public DictionaryEntry Entry => new DictionaryEntry(en.Current.Key, en.Current.Value);
        public object Current => Entry;

        public bool MoveNext() => en.MoveNext();
        public void Reset() => en.Reset();

        public object Key => en.Current.Key;
        public object Value => en.Current.Value;
    }
    #endregion
}

Given this class, it can be used as:

var d = new CaseDictionary<int>();
d.Add("word", 1);
d.Add("Word", 2);
d.Add("WOrd", 3);
d.Add("word2", 4);
d.Add("worD2", 5);

Console.WriteLine(d.ContainsKey("WOrd"));
Console.WriteLine(d.ContainsKey("WOrd2"));
Console.WriteLine(d.ContainsKeyCI("WOrd2"));
Console.WriteLine(d["word2"]);
d["word2"] = 6;
Console.WriteLine(d["word2"]);

Console.WriteLine();
foreach (var w in d.AtCI["word2"])
    Console.WriteLine(w);

Output is:

True
False
True
4
6

6
5
NetMage
  • 26,163
  • 3
  • 34
  • 55
1

I recently ran across this question since I needed to solve for the same scenario, with the following guidelines:

  • Both types of lookups should be fast (avoid O(n) for case-sensitive path).
  • Use a single dictionary rather than two separate dictionaries (avoid double the space requirement).
  • Avoid needless allocations via string conversion (.ToLower()) when performing lookups.
  • Trade a reasonable amount of additional needed memory for satisfying the above.

I didn't feel that any of the current answers met the above and, as a result, I came up with the following dictionary definition:

Dictionary<string, (string Key, T Value)> _dict = 
    new Dictionary<string, (string Key, T Value)>(StringComparer.OrdinalIgnoreCase);

This allows for case-insensitive hashing of the key, while using a ValueTuple to store the actual key in its raw string form for additional case-sensitive comparisons, if need be, alongside the value. The full implementation looks like the following:

public class MultiCaseDictionary<T>
{
    public MultiCaseDictionary() : this(0) { }

    public MultiCaseDictionary(int capacity)
    {
        _dict = new(capacity, StringComparer.OrdinalIgnoreCase);
    }

    private readonly Dictionary<string, (string Key, T Value)> _dict;

    public bool TryGetValue(string key, out T value, bool isCaseInsensitive = true)
    {
        if (_dict.TryGetValue(key, out (string Key, T Value) match))
        {
            if (isCaseInsensitive)
            {
                value = match.Value;
                return true;
            }

            if (key.Equals(match.Key))
            {
                value = match.Value;
                return true;
            }
        }

        value = default;
        return false;
    }

    public void Add(string key, T value)
    {
        _dict[key] = (key, value);
    }
}

This structure results in excellent performance for both the case sensitive and insensitive paths, and the case-sensitive path is slightly slower than the case-sensitive (optimal) path in the accepted answer due to needing to perform extra work.

However, it is orders of magnitude faster for the case-insensitive path, due to the accepted answer having a runtime complexity of O(n). In addition, the case-insensitive search of MultiCaseDictionary is actually faster than the case-sensitive search of CaseSensitiveDictionary.

Finally, the Add path is slower in the MultiCaseDictionary due to needing to construct a ValueTuple. The following benchmark demonstrates both implementations, running on .NET 6:

Implementations

public class MultiCaseDictionary<T>
{
    public MultiCaseDictionary() : this(0) { }

    public MultiCaseDictionary(int capacity)
    {
        _dict = new(capacity, StringComparer.OrdinalIgnoreCase);
    }

    private readonly Dictionary<string, (string Key, T Value)> _dict;

    public bool TryGetValue(string key, out T value, bool isCaseInsensitive = true)
    {
        if (_dict.TryGetValue(key, out (string Key, T Value) match))
        {
            if (isCaseInsensitive)
            {
                value = match.Value;
                return true;
            }

            if (key.Equals(match.Key))
            {
                value = match.Value;
                return true;
            }
        }

        value = default;
        return false;
    }

    public void Add(string key, T value)
    {
        _dict[key] = (key, value);
    }
}

public class CaseSensitiveDictionary<T>
{
    public CaseSensitiveDictionary() : this(0) { }

    public CaseSensitiveDictionary(int capacity)
    {
        _dict = new(capacity);
    }

    private readonly Dictionary<string, T> _dict;

    public bool TryGetValue(string key, out T value, bool isCaseInsensitive = true)
    {
        if (!isCaseInsensitive)
        {
            if (_dict.TryGetValue(key, out value))
            {
                return true;
            }
        }
        else
        {
            key = _dict.Keys.FirstOrDefault(k =>
                string.Compare(key, k, StringComparison.OrdinalIgnoreCase) == 0);
            if (key == null)
            {
                value = default;
                return false;
            }

            value = _dict[key];
            return true;
        }

        value = default;
        return false;
    }

    public void Add(string key, T value)
    {
        _dict[key] = value;
    }
} 

Benchmarks

The benchmarks preload 1000 entries for TryGetValue tests to demonstrate O(n) behavior. In addition, they insert 10 entries for Add tests to check space requirements and add performance degradation.

[SimpleJob(RuntimeMoniker.Net60, invocationCount: 50000)]
[MemoryDiagnoser]
[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)]
[CategoriesColumn]
public class CaseSensitiveDictionaryBenchmarks
{
    // use longer key to simulate expected hash costs
    private const string Key = "100000500";
    private const int Capacity = 1000;
    private const int Iterations = 9;

    private MultiCaseDictionary<int> _multiCaseDictionary;
    private CaseSensitiveDictionary<int> _caseSensitiveDictionary;

    [IterationSetup(Targets = new string[] {
        nameof(MultiCaseDictionary_Search),
        nameof(MultiCaseDictionary_InsensitiveSearch)
    })]
    public void SetupMultiCase()
    {
        _multiCaseDictionary = new MultiCaseDictionary<int>(Capacity);
        for (int i = 100000000; i < 100001000; i++)
        {
            _multiCaseDictionary.Add(i.ToString(), i);
        }
    }

    [IterationSetup(Targets = new string[] {
        nameof(CaseSensitiveDictionary_Search),
        nameof(CaseSensitiveDictionary_InsensitiveSearch),
    })]
    public void SetupCaseSensitive()
    {
        _caseSensitiveDictionary = new CaseSensitiveDictionary<int>(Capacity);
        for (int i = 100000000; i < 100001000; i++)
        {
            _caseSensitiveDictionary.Add(i.ToString(), i);
        }
    }

    [Benchmark(Baseline = true), BenchmarkCategory("TryGetValue")]
    public int CaseSensitiveDictionary_Search()
    {
        int result;
        for (int i = 0; i < Iterations; i++)
        {
            _caseSensitiveDictionary.TryGetValue(Key, out result, false);
        }
        _caseSensitiveDictionary.TryGetValue(Key, out result, false);
        return result;
    }

    [Benchmark, BenchmarkCategory("TryGetValue")]
    public int CaseSensitiveDictionary_InsensitiveSearch()
    {
        int result;
        for (int i = 0; i < Iterations; i++)
        {
            _caseSensitiveDictionary.TryGetValue(Key, out result, true);
        }
        _caseSensitiveDictionary.TryGetValue(Key, out result, true);
        return result;
    }

    [Benchmark, BenchmarkCategory("TryGetValue")]
    public int MultiCaseDictionary_Search()
    {
        int result;
        for (int i = 0; i < Iterations; i++)
        {
            _multiCaseDictionary.TryGetValue(Key, out result, false);
        }
        _multiCaseDictionary.TryGetValue(Key, out result, false);
        return result;
    }

    [Benchmark, BenchmarkCategory("TryGetValue")]
    public int MultiCaseDictionary_InsensitiveSearch()
    {
        int result;
        for (int i = 0; i < Iterations; i++)
        {
            _multiCaseDictionary.TryGetValue(Key, out result, true);
        }
        _multiCaseDictionary.TryGetValue(Key, out result, true);
        return result;
    }

    [Benchmark(Baseline = true), BenchmarkCategory("Add")]
    public CaseSensitiveDictionary<int> CaseSensitiveDictonary_Add()
    {
        _caseSensitiveDictionary = new CaseSensitiveDictionary<int>(Capacity);
        for (int i = 100000000; i < 100000010; i++)
        {
            _caseSensitiveDictionary.Add(i.ToString(), i);
        }

        return _caseSensitiveDictionary;
    }

    [Benchmark, BenchmarkCategory("Add")]
    public MultiCaseDictionary<int> MultiCaseDictionary_Add()
    {
        _multiCaseDictionary = new MultiCaseDictionary<int>(Capacity);
        for (int i = 100000000; i < 100000010; i++)
        {
            _multiCaseDictionary.Add(i.ToString(), i);
        }

        return _multiCaseDictionary;
    }
} 

Results benchmarkresults


BenchmarkDotNet=v0.13.2, OS=macOS Catalina 10.15.7 (19H1417) [Darwin 19.6.0]
Intel Core i5-1038NG7 CPU 2.00GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.400
  [Host]     : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2
  Job-NQOSHT : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2

Runtime=.NET 6.0  InvocationCount=50000  

Method Categories Mean Error StdDev Median Ratio RatioSD Gen0 Allocated Alloc Ratio
CaseSensitiveDictonary_Add Add 2,423.0 ns 24.67 ns 20.60 ns 2,418.3 ns 1.00 0.00 10.0000 31440 B 1.00
MultiCaseDictionary_Add Add 3,100.2 ns 61.03 ns 59.94 ns 3,109.2 ns 1.27 0.02 12.8200 40264 B 1.28
CaseSensitiveDictionary_Search TryGetValue 251.8 ns 6.47 ns 18.67 ns 246.1 ns 1.00 0.00 0.0600 240 B 1.00
CaseSensitiveDictionary_InsensitiveSearch TryGetValue 95,573.4 ns 1,888.58 ns 2,768.26 ns 95,401.3 ns 378.77 29.36 0.4000 1280 B 5.33
MultiCaseDictionary_Search TryGetValue 265.5 ns 5.11 ns 10.43 ns 262.2 ns 1.05 0.07 - - 0.00
MultiCaseDictionary_InsensitiveSearch TryGetValue 233.1 ns 4.83 ns 14.16 ns 226.2 ns 0.93 0.09 - - 0.00

Memory consumption will be slightly higher with MultiCaseDictionary, depending on the size of your string key, since both the hash of key and the raw key itself will need to be stored. The larger your key, the more memory will be used. For the current benchmark (smaller keys), the 28% additional memory is reasonable for my use case.

David L
  • 32,885
  • 8
  • 62
  • 93
0

You could use new Dictionary<string,(string CaseSensitiveKey,T Data) where keys are always lowercase (see below), but...

A. User friendlier search string.Contains or Regex.IsMatch

(I added this later)

I think that you may end up using string.Contains (or maybe even Regex.IsMatch) so that your searches can catch partial matches.

Regex.IsMatch

var d = new Dictionary<string, string>() {
      { "First Last", "Some data" },
      { "Fir La", "Some data 2" } };

while (true)
{
    var term = Console.ReadLine();

    // Case-sensitive flag would control RegexOptions
    var results = d.Where( kvp => Regex.IsMatch(kvp.Key, term, RegexOptions.IgnoreCase)).ToList();

    if (results.Any())
        foreach (var kvp in results)
            Console.WriteLine($"\t{kvp.Key}:{kvp.Value}");
    else
        Console.WriteLine("Not found");
}
fi.*la
        First Last:Some data
        Fir La:Some data 2
fir.*t
        First Last:Some data

Contains

    // Case-sensitive flag would control `StrinComparison` flag.
    var results = d.Where(
      kvp => kvp.Key.ToLower().Contains(term.ToLower(), StringComparison.InvariantCultureIgnoreCase))
      .ToList();
}
Fi
Found First Last:Some data
Found Fir La:Some data 2
First
Found First Last:Some data
Fal
Not found

B. I want a dictionary search. And FAST

You could use new Dictionary<string,(string CaseSensitiveKey,T Data) where keys are always lowercase.

This will not work if it's possible to have 'Gerardo Grignoli' and 'gerardo grignoli' in the dictionary, but I suspect that this is not the case in your case because if you're asking for lookups on keys, you don't are not after partial matches. This is obviously just an assumption.

If you're after a fast solution for full matches with handling of entries which differ only by case please see other answers with Dictionary<string, Dictionary<string, TValue>>.

public static T LowerCaseKeyWay<T>(Dictionary<string, (string CaseSensitiveKey, T Data)> d, string term, bool isCS)
          => d.TryGetValue(term.ToLower(), out var item)
                  ? !isCS
                        ? item.Data
                        : term == item.CaseSensitiveKey ? item.Data : default
                  : default;

Example of use.

class SO
{
    public int Number { get; set; }
    public int Rep { get; set; }
}


public static void Main(string[] args)
{

  var d = new Dictionary<string,(string CaseSensitiveKey,SO Data)>() {
    { "Gerardo Grignoli".ToLower(), ("Gerardo Grignoli", new SO { Number=97471, Rep=7987} )},
    { "John Wu".ToLower(), ("John Wu", new SO { Number=2791540, Rep=34973})}
  };

  foreach( var searchTerm in new []{ "Gerardo Grignoli",  "Gerardo Grignoli".ToLower()} )
  foreach( var isSearchCaseSensitive in new[]{true,false} ) {
      Console.WriteLine($"{searchTerm}/case-sensitive:{isSearchCaseSensitive}: {Search(d, searchTerm, isSearchCaseSensitive)?.Rep}");
  }

}

Output

Gerardo Grignoli/case-sensitive:True: 7987
Gerardo Grignoli/case-sensitive:False: 7987
gerardo grignoli/case-sensitive:True: 
gerardo grignoli/case-sensitive:False: 7987

Primitive speed test

Result

noOfSearches: 1000
  noOfItems: 100
    Lowercase key way:        Elapsed 4ms, count found: 1500
    Linq way                  Elapsed 57ms, count found: 1500
noOfSearches: 1000
  noOfItems: 1000
    Lowercase key way:        Elapsed 3ms, count found: 3000
    Linq way                  Elapsed 454ms, count found: 3000
noOfSearches: 10000
  noOfItems: 100
    Lowercase key way:        Elapsed 11ms, count found: 15000
    Linq way                  Elapsed 447ms, count found: 15000
noOfSearches: 10000
  noOfItems: 1000
    Lowercase key way:        Elapsed 10ms, count found: 15000
    Linq way                  Elapsed 5156ms, count found: 15000
noOfSearches: 100000
  noOfItems: 100
    Lowercase key way:        Elapsed 113ms, count found: 150000
    Linq way                  Elapsed 5059ms, count found: 150000
noOfSearches: 100000
  noOfItems: 1000
    Lowercase key way:        Elapsed 83ms, count found: 150000
    Linq way                  Elapsed 48855ms, count found: 150000
noOfSearches: 1000000
  noOfItems: 100
    Lowercase key way:        Elapsed 1279ms, count found: 1500000
    Linq way                  Elapsed 49558ms, count found: 1500000
noOfSearches: 1000000
  noOfItems: 1000
    Lowercase key way:        Elapsed 961ms, count found: 1500000
(...)

Test code (I'm happy for this to be ripped apart)

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace ConsoleApp4
{

    class SO
    {
        public int Number { get; set; }

        public int Rep { get; set; }
    }

  class Program
  {
      public static void Main(string[] args)
      {
        // Preload linq
        var _ = new []{"•`_´•"}.FirstOrDefault( k => k == "(O_O)" );

        foreach( int noOfSearches in new []{1000, 10000, 100000, 1000000} ) 
          foreach( int noOfItems in new []{100, 1000} ) 
          {
            var d1 = new Dictionary<string, SO>();

            for(int i = 0; i < noOfItems; i++) {
              d1.Add($"Name {i}", new SO {Number = i, Rep = i *2});
            }

            var d2 = new Dictionary<string, (string CaseSensitiveKey, SO Data)>();
            foreach (var entry in d1)
            {
                d2.Add(entry.Key.ToLower(), (entry.Key, entry.Value));
            }


            Console.WriteLine($"noOfSearches: {noOfSearches}");
            Console.WriteLine($"  noOfItems: {noOfItems}");

            Console.Write("    Lowercase key way:".PadRight(30));
            PrimitiveSpeedTest( (term, isCS) => LowerCaseKeyWay(d2, term, isCS), noOfItems, noOfSearches);
            Console.Write("    Linq way".PadRight(30));
            PrimitiveSpeedTest( (term, isCS) => LinqWay(d1, term, isCS), noOfItems, noOfSearches);
          }

      }

      private static void PrimitiveSpeedTest(Func<string, bool, SO> search, int noOfItems, int noOfSearches)
      {
          var count = 0;
          Stopwatch sw = Stopwatch.StartNew();
          for (int i = 0; i < noOfSearches; i++)
          {
            var originalTerm = $"Name {i % (noOfItems*2)}"; // Some found, some not found
              foreach (var term in new[] { originalTerm, originalTerm.ToLower() })
                  foreach (var isCS in new[] { true, false })
                  {
                      var so = search(term, isCS);
                      if (so != null) count++;
                      //Console.WriteLine($"{term}/case-sensitive:{isCS}: {Search(d, term, isCS)?.Rep}");
                  }

          }
          var elapsed = sw.Elapsed;

          Console.WriteLine($"Elapsed {sw.ElapsedMilliseconds}ms, count found: {count} ");
      }

      public static SO LowerCaseKeyWay(Dictionary<string, (string CaseSensitiveKey, SO Data)> d, string term, bool isCS)
        => d.TryGetValue(term.ToLower(), out var item)
                ? !isCS
                      ? item.Data
                      : term == item.CaseSensitiveKey ? item.Data : null
                : null;

      static public T LinqWay<T>(Dictionary<string,T> source, string key, bool caseSensitive)
      {
          //Original: if (caseSensitive) return source[key];
          if(caseSensitive) return source.ContainsKey(key) ? source[key] : default;
          key = source.Keys.FirstOrDefault( k => String.Compare(key, k, StringComparison.CurrentCultureIgnoreCase) == 0);
          //Original: if (key == null) throw new KeyNotFoundException();
          if (key == null) return default;
          return source[key];
      }
  }
}
tymtam
  • 31,798
  • 8
  • 86
  • 126
0

Since Dictionary hashes the key, you should use a Dictionary<String, Dictionary<String, T>>.


Adding a key:

  • Convert the given mixed-case key to all lowercase;
  • Get the dictionary to the all lowercase key;
  • Add the to this dictionary.

Case-insensitive search:

  • Convert the mixed-case key to all lowercase;
  • Get the dictionary for this all lowercase key;
  • Iterate over the values in the dictionary.

Case-sensitive search

  • Convert the mixed-case key to all lowercase;
  • Get the dictionary for this all lowercase key;
  • Search for mixed-case key in the dictionary obtained in the step above.
displayName
  • 13,888
  • 8
  • 60
  • 75
  • I am never a fan of having to `ToLower` the key, since someone may forget to do it. I'd prefer the outer `Dictionary` to have a case insensitive comparer instead as per @NetMage's comment. – mjwills Oct 22 '19 at 00:26
  • @mjwills: I agree. I like to write my code fool-proof as well. I figured a more detailed solution will obfuscate the idea. Another detailed solution is to create a wrapper class that encapsulates all this functionality and allows a boolean parameter for case-sensitivity. – displayName Oct 22 '19 at 00:31
  • @mjwills: Another good solution, which would also expand to mentally overloading details is in modifying that hashing algorithm. We can write our own hash implementation that takes into account the size of the underlying structure. That could allow iteration over all keys that are same when case is ignored. – displayName Oct 22 '19 at 00:34
-1

You will definitely not get around writing your own dictioanry (derivate). The first value is a key. As such it is intended only for exact match, not something like non case-sensitve match. Actually it is even worse then that:

I recently learned that Dictionary is also our generic Hashtable. It uses the Hashtable approach (getting a hash for every key and input and comparing that one first), to speed up comparision, especcially on stuff like strings. So when looking up a key, it goes through teh key collection and:

  1. Compare the Hashes. If they do not match, this can not be the key.
  2. If they do match, do the full comparision for the type. Hash colissions ar a thing, so the hash can only be used for early "not a match" filtering.

Your requirements kinda breaks that. Utterly. You actually would end up with not-matches thanks to the hash, when it should match.

The first solution would be to stop trying to do that in Code, and go to a proper DBMS instead. They tend to have support for all the wierd comparisions you might be able to think up. With a lot of ways to speed them up, like indexes. There should be a in-process database out there. But few people are eery willing to go that route.

The second solution I can think up is try to rewrite Dictionary, with as little work as nesseasry. Some ideas:

  • the key should interally stored only in upper or lower case, ragardles what the user entered. I will asume lower case, as it seems intuitive for me and just a call of .toLower() away.
  • you need to store the full key, casing and all. For simplicity I would jsut add a field for htat to the value, asuming you are really certain no one will modify that one.
  • when looking up a key, first use the buildin match for the lowcase version of the input. Then (if nesseary) also check the original key, before you report a match/not match.

You basically add a step 3 to my above listing:

  1. If the lower case of the input matches the (lowercase) key and case sensitivey is required, now check the stored cased key against the cased input

Hopefully would only ahve to modify the add and find routines. Stuff like remove should use the find function to first find the element. It is a bit hacky. Ideally you want to hide the internals of how you do this from the user, so the list of cased keys should be private. Of coruse that means having to touch way more code.

Christopher
  • 9,634
  • 2
  • 17
  • 31