I have n layers of classes representing data repositories, that store a value keyed on a tuple of DateTime
and TimeSpan
. The purpose of the "layers" is as a caching mechanism; the bottom-most layer is the slow data origin, and the top-most layer serves as a fast in-memory cache.
Think of these layers as a group: the group wrapper class receives a request for values between time x and y. It first creates a list of timeslots that represent the time range between x and y, and then it calls each layer in turn to get values for those timeslots.
At the moment, I have this performing reasonably well by using a HashSet<T>
of time slots. First, the top-most layer is asked for matching results by being passed this HashSet<T>
. For each slot returned, I call Remove()
on the HashSet<T>
.
Next, the layer further down is called with the same (but now potentially smaller) HashSet<T>
. And so-on, so that each layer is only asked to fulfil the time slots that haven't already been loaded by a provider further up in the stack.
The reason for this layered structure is to allow for automatically "filling" any missing timeslots in the data. The in-memory cache layer must be free to deallocate memory as needed, but fill the gaps by loading from layers below when asked again for the data.
In performance profiling, the constructor of the HashSet<T>
(wherein it must check for uniqueness of the source sequence of timeslots) and the subsequent calls to Remove()
represent about a 45% overhead versus just calling the in-memory provider directly outside of a group setting.
Is there another solution to this kind of problem? Another way of thinking of my goal is that I have a Dictionary<(DateTime, TimeSpan), TValue>
and I'm looking to - repeatedly - find which values are missing as fast as possible over millions of keys.
Many thanks in advance!