1

I wish to associate a memory cached data structure with a set of interned strings and use a passed instance of an interned string to lookup its associated data structure.

The predefined set of strings will be around 1000 in number. Cache population costs can be ignored but I want high performance lookup.

public class InternedExtras
{
  public DateTime Prop1 {get; set; }
  public Decimal Prop2 {get; set; }
}

Ideally I would create a Dictionary keyed on an interned string's reference but .Net does not expose object references as a specific type.

If I declare my Dictionary as:

Dictionary<string, InternedExtras>

then I am concerned that the System.String equality override will invoke char by char string value comparison during dictionary lookup, which will be inefficient.

An option would be:

Dictionary<int, InternedExtras> _extrasDictionary

InternedExtras GetInternedExtras( string knownToBeInterned )
{
  return _extrasDictionary[ knownToBeInterned.GetHashCode() ];
}

However I have never fully understood hash code maths and understand uniqueness is not guaranteed.

The average length of my interned strings is 50 chars and I can deploy to the latest .Net version.

camelCase
  • 1,549
  • 2
  • 16
  • 28
  • 1
    I know its old, but if you have total control over the code, another option is to create a `struct` wrapper around your interned strings that contains a reference to the `InternedExtras` and just use your wrapper. You could make an `implicit operator string(Wrapper)` so you could use it as a string, just don't make an `implicit operator Wrapper(string)` since he would have to do a lookup of some sort to find the relevant `InternedExtras`, but tbh, unless you profiled and determined `string.GetHashCode()` is a bottleneck, I wouldn't do anything, its only 1000 elements, odds of collision are low – mhand Sep 04 '21 at 07:23

1 Answers1

1

I actually think this is your most efficient option:

Dictionary<string, InternedExtras> _extrasDictionary;

Doing a looking as follows is actually very efficient!

InternedExtras extras = _extrasDictionary[interned];

The char by char comparison that you refer to will only be called on a small subset of strings. This is because interned.GetHashCode() will be used to group they keys into "buckets".

This question has much more details on the subject:

How does a hash table work?

Community
  • 1
  • 1
dana
  • 17,267
  • 6
  • 64
  • 88
  • @ dana - Thankyou for the reassurance about Dictionary/HashCode performance. Would I be correct in thinking that System.String does not store a string's hash code between calls to someString.GetHashCode? – camelCase Mar 09 '17 at 20:43
  • 1
    I don't actually think it does (see source code ref below). I just know that `Dictionary` is the de-facto way to do fast in-memory lookups and is part of the BCL (i.e. implemented by Microsoft). If you have any doubts, you should run some performance tests. https://referencesource.microsoft.com/#mscorlib/system/string.cs – dana Mar 09 '17 at 20:54