0

I have n layers of classes representing data repositories, that store a value keyed on a tuple of DateTime and TimeSpan. The purpose of the "layers" is as a caching mechanism; the bottom-most layer is the slow data origin, and the top-most layer serves as a fast in-memory cache.

Think of these layers as a group: the group wrapper class receives a request for values between time x and y. It first creates a list of timeslots that represent the time range between x and y, and then it calls each layer in turn to get values for those timeslots.

At the moment, I have this performing reasonably well by using a HashSet<T> of time slots. First, the top-most layer is asked for matching results by being passed this HashSet<T>. For each slot returned, I call Remove() on the HashSet<T>.

Next, the layer further down is called with the same (but now potentially smaller) HashSet<T>. And so-on, so that each layer is only asked to fulfil the time slots that haven't already been loaded by a provider further up in the stack.

The reason for this layered structure is to allow for automatically "filling" any missing timeslots in the data. The in-memory cache layer must be free to deallocate memory as needed, but fill the gaps by loading from layers below when asked again for the data.

In performance profiling, the constructor of the HashSet<T> (wherein it must check for uniqueness of the source sequence of timeslots) and the subsequent calls to Remove() represent about a 45% overhead versus just calling the in-memory provider directly outside of a group setting.

Is there another solution to this kind of problem? Another way of thinking of my goal is that I have a Dictionary<(DateTime, TimeSpan), TValue> and I'm looking to - repeatedly - find which values are missing as fast as possible over millions of keys.

Many thanks in advance!

Alex Norcliffe
  • 2,439
  • 2
  • 17
  • 21
  • 6
    I think adding some code of what should be "fast" could help in getting (better) answers. – Julian Feb 09 '20 at 01:49
  • What does a `(DateTime, TimeSpan)` represents? A time period that starts at a specific date and time, with a specific duration? – Theodor Zoulias Feb 09 '20 at 05:40
  • @TheodorZoulias precissly - a slot of time – Alex Norcliffe Feb 09 '20 at 07:27
  • @Julian I can try to craft a contrived example tomorrow, but right now HashSet is the fastest and I'm looking for creative ways of step-change in performance when evaluating the goal "which vales for a key collection are empty?". I've tried a List.BinarySearch, alternative OSS HashSet implementations, but nothing moves the needle enough – Alex Norcliffe Feb 09 '20 at 07:31
  • Is it possible to exist overlapping slots of time in the collection? – Theodor Zoulias Feb 09 '20 at 08:00
  • `the constructor of the HashSet` - Did you mean `Add` method instead of constructor? ; you mention the `Remove` method later in that same discussion, so `Add` makes more sense. Add/Remove operations should be quick unless the default `IEqualityComparer.GetHashCode` method is not yielding a good distribution for the values being added. You could try supplying the `HashSet` a custom `IEqualityComparer` implementation. You indicate that that the value is a tuple, [see the answers to this question](https://stackoverflow.com/q/955982/2592875) for a discussion on some of the issues with tuples. – TnTinMn Feb 09 '20 at 17:17

0 Answers0