We have a Hashtable (specifically the C# Dictionary
class) that holds several thousands/millions of (Key,Value) pairs for near O(1) search hits/misses.
We'd like to be able to flush this data structure to disk (serialize it) and load it again later (deserialize) such that the internal hashtable of the Dictionary is preserved.
What we do right now:
- Load from Disk =>
List<KVEntity>
. (KVEntity
is serializable. We use Avro to serialize - can drop Avro if needed) - Read every
KVEntity
from array => dictionary. This regenerates the dictionary/hashtable internal state. - < System operates, Dictionary can grow/shrink/values change etc >
- When saving, read from the dictionary into array (via
myKVDict.Values.SelectMany(x => x)
into a newList<KVEntity>
) - We serialize the array (
List<KVEntity>
) to disk to save the raw data
Notice that during our save/restore, we lose the internal tashtable/dictionary state and have to rebuild it each time.
We'd like to directly serialize to/from Dictionary (including it's internal "live" state) instead of using an intermediate array just for the disk i/o. How can we do that?
Some pseudo code:
// The actual "node" that has information. Both myKey and myValue have actual data work storing
public class KVEntity
{
public string myKey {get;set;}
public DataClass myValue {get;set;}
}
// unit of disk IO/serialization
public List<KVEntity> myKVList {get;set;}
// unit of run time processing. The string key is KVEntity.myKey
public Dictionary<string,KVEntity> myKVDict {get;set;}