3

I have a sorted dictionary with DateTime as the key:

myDictionary = new SortedDictionary<DateTime, float>();

I would like to implement the following function:

data[] GetRange(DateTime From, DateTime To)

The fastest way would be to get find the first / last index in the values and then get the data from a range of values

But to do this, I would need to find how I can get the index of 'From' and 'To'.

Is there a way to do this? or, is there a faster method to achieve this?

Right now I am looking at having an array for the values and having a dictionary to look up in the data array.

Thomas
  • 10,933
  • 14
  • 65
  • 136

2 Answers2

1

Given your need for high-performance, you might consider a data structure that allows a binary search for your min and max values. If SortedDictionary provided an indexer Item[int], you could use that but, alas, it does not.

You can consider something like

struct PriceAtTime
{
    public DateTime Timestamp { get; set; }
    public float Price { get; set; } // Or whatever your float represents
}

List<PriceAtTime> myData = GetTheData(); // Assumes the provided data is ordered
                                         // by timestamp.

To find the index that contains the first data point at or after your minimum timestamp:

  • Check myData[myData.Count/2]
  • Depending on the value of that element's timestamp, you either found it, or the middle element is newer than the minimum so check myData.Count/4, or it's higher so check 3*myData.Count/4. Repeat recursively until you find the right element.
  • Similar approach to find the index of the last element that doesn't exceed your max value.

SortedList<T> sounds like a promising type, but in reality, it behaves much like a sorted dictionary, especially for keyed lookups.

Note that I assume the elements in the list are magically sorted. A self-sorting data structure can be fairly expensive in a real-time performance environment. If you can obtain the data already sorted from wherever it resides, you eliminate another performance concern.

Eric J.
  • 147,927
  • 63
  • 340
  • 553
0

You could use LINQ to do this:

myDictionary.Where(d => d.Key >= from && d.Key <= to).Select(d => d.Value).ToArray();
Xipooo
  • 1,496
  • 1
  • 14
  • 13
  • but wouldn't that trigger a compare per element? – Thomas Sep 04 '19 at 23:28
  • 1
    Yes, but unless your SortedDictionary is absolutely huge or used in a loop in a hotspot in your code, you won't notice the performance hit. – Eric J. Sep 04 '19 at 23:32
  • @Thomas You could use take and skip to pair down how many elements you iterate over if that's a known quantity, otherwise you're looking at for loops if performance is REALLY that important. As Eric says though, it's not really that noticeable unless you're dealing with massive quantities of data. – Xipooo Sep 04 '19 at 23:32
  • it is large; it's financial data with dates and I need to get ranges as fast as possible in a context where speed matters. – Thomas Sep 04 '19 at 23:33
  • I can skip the loop if I can identify the index of the from and to dates. – Thomas Sep 04 '19 at 23:34
  • Sorting alone will chew up all the gains you would make trying to go that route. You're better off doing raw for loops. – Xipooo Sep 04 '19 at 23:36
  • 1
    I'm with Eric. If you have this stuff in memory already you should just scan it. If it's really as big as you say, it should probably be in a database and indexed, not held as an in-memory data structure. If you really want it in memory and it takes too long to scan it, invest in better hardware, which will be cheaper than paying a programmer to microoptimize the code, and far less prone to having bugs. – John Wu Sep 04 '19 at 23:46
  • The data is already in a db, the reason I need to cache it in memory is specifically for speed; so fast the fastest I have is an array of data, and a dictionary, so I can do two dictionary lookups and then get 2 index. the data is static and already sorted by time when it arrives; but I'm checking if there is a faster approach. It's on a dedicated server with 96gb ram; the dataset is roughly 7gb, but it's chunked on the db in blocks of about 200mb, so I need to cache blocks that are in use and allow fast search in them – Thomas Sep 05 '19 at 00:00
  • @Thomas This seems the sort of thing better handled by the SQL server as a stored procedure, not an EF LINQ query. You want data parsing to happen as close to the source as possible if performance is that much of a premium. – Xipooo Sep 05 '19 at 00:13
  • 1
    even the most naive implementation with all data in ram will still beat a server roundtrip; At this stage, I'm looking at the fastest way, without having to give up c#, regardless of the ram usage; eventually this will be duplicated because the processing is not fitting on a single machine and there is no time to send data to other computers; so this is an end of line optimization trying to avoid switching to C/C++ – Thomas Sep 05 '19 at 00:21
  • Yes, even a full array scan in memory beats network IO every time (except for extreme hand-crafted edge cases designed to prove me wrong :-) – Eric J. Sep 05 '19 at 15:31
  • Right, so if the goal is to filter the data why use the network load to pull ALL the data locally and process on a smaller machine when you can use the server memory as a stored proc to process first then transmit? – Xipooo Sep 05 '19 at 17:01