Looking up floats with some epsilon from a data structure in C#, with search and insert both O(lg n) time

Question

In C++ I was able to use std::map<double, T> which is an ordered dictionary for its keys, but is a Red-Black tree which gives me O(lg n) for both insert and search. I was able to look up whether a value existed within some epsilon by using std::lower_bound and std::upper_bound together.

I have not been able to find the same thing while using C# 7+/.NET Core. Does such a thing exist?

In pseudocode, I'd like to do something like this

Map<float, T> map = ...
//         key    epsilon  newValue
map.Insert(0.5f,  0.1f,    someObj);  // No values in the map, inserts fine
map.Get(   0.45f, 0.1f);              // 0.45 +/- 0.1 contains 0.5, would return someObj
map.Get(   0.3f,  0.1f);              // 0.3 +/- 0.1 does not include 0.5, it is not found
map.Insert(0.55f, 0.1f, anotherObj);  // 0.55 +/- 0.1 includes 0.5, replace someObj
map.Insert(0.35f, 0.1f, anObj);       // 0.35 +/- 0.1 doesn't overlap, insert new value

The way I'd have to do it would be to roll my own self-balancing binary search tree, but I'd rather not reinvent the wheel if such a thing exists.

I've been looking at SortedDictionary, however its Keys field is a collection so I can't jump around in it. Same issue for OrderedDictionary, unless I missed something.

I may not be able to use a SortedList since there will be more insertions than lookups, and due to the random order I'm worried that I'll end up getting a lot of O(n) swaps that need to be done when insertions. I'm assuming a uniform distribution in my input (which is very likely the case because of the data I'm working with), which means the insertions towards the middle and the front would cause a lot of shifting if it implements it the way I think it does... which would give me on average a cost of n/2 insertions and leave me at O(n). At least with a binary search tree, I'm getting O(lg n). Therefore the good solution here may not be applicable.

Most importantly, this is an algorithm that is used in a very hot section of the code. Performance is extremely important, choosing something that is not fast will likely drastically damage the performance of the application. I really need O(lg n) or some novel way of doing this that I didn't think of before.

You might want to upvote https://github.com/dotnet/corefx/issues/30953 — Lesiak, May 02 '19 at 21:50
Possible duplicate of [How to get the closest item to my key from a SortedDictionary?](https://stackoverflow.com/questions/11691342/how-to-get-the-closest-item-to-my-key-from-a-sorteddictionary) — Brennan Vincent, May 02 '19 at 22:21
This just isn't different between C++ and C#, std::map is implemented with a [red-black tree](https://stackoverflow.com/questions/5288320/why-is-stdmap-implemented-as-a-red-black-tree). So is [SortedDictionary](https://stackoverflow.com/questions/14909853/is-sorteddictionary-a-red-black-tree), there are not that many ways to do this differently. But std::map with a float for the key is a bug, you can reimplement that bug with your fingers crossed. And the Project > Properties > Build tab, "Prefer 32-bit" unchecked to make the fingers lucky. — Hans Passant, May 02 '19 at 22:28
It is well-covered everywhere, Googling "don't compare floating point for equality" is a decent way to find what's out there. — Hans Passant, May 02 '19 at 22:39
What do you mean by "can't jump around in it" and "random order" in sortedList? — Magnus, May 03 '19 at 06:12
@Hans Passant While comparing floats by equality is a bad idea in general, but if you find exactly the same float that is already a key by first specifying a range and iterating over it, and then using found key as an argument to get, this won't be a problem. Is there anything wrong with this approach? — Lesiak, May 03 '19 at 06:47

Lesiak · Answer 1 · 2019-05-02T22:18:01.963

My idea is to combine two data structures, SortedSet and a regular map.

SortedSet has GetViewBetween method, which has expected performance. https://github.com/dotnet/corefx/pull/30921

Note: the expected performance of this method is met only in .NET core, it was much slower in the past: Why SortedSet<T>.GetViewBetween isn't O(log N)?

In this set you keep only the float keys. Additionally, you have a Map from float to your desired type. You perform operations on the map only after checking your SortedSet.

I realize there are some rough edges (when an interval gives a few entries in the SortedSet), but I believe this is equivalent to the cpp implementation.

Hope you find this helpful, good luck with the implementation.

score 0 · Accepted Answer · answered May 04 '19 at 15:15

Now while this answer I'm about to give is a C++ profiled answer and not with C#, it solves the problem in a much better and faster way.

The better way to solve this is multiplying the floating point by the inverse of the epsilon. For example if your epsilon is 0.25, then you'd want to multiply all your floats/doubles by 4 and then cast it to an integer (or floor/ceil it if you care about things collecting around zero). The following uses int as the key but this would be fine for longs as well. My data fits in the +/- 2^31 range after quantizing (on computers with at least sizeof int being 4 bytes) so this is sufficient for me.

// Consider using std::is_floating_point_v for K
template <typename K, typename V>
class QuantizedGrid {
    int quantizer;
    std::unordered_map<int, V> map;

public:
    explicit QuantizedGrid(const double epsilon) {
        quantizer = 1.0 / epsilon;
    }

    V& operator[](const K k) {
        return map[static_cast<int>(quantizer * k)];
    }

    bool contains(const K k) const {
        int key = static_cast<int>(quantizer * k);
        return map.count(key) > 0;
    }
};

Compared to using upper/lower bound checks, the performance from that to the above code is as follows:

or rather it was 650% faster to convert to an integer and insert into a dictionary that supports O(1) amortized insertion/lookup/delete.

It is also way less code than implementing a custom upper/lower bound.

My guess is the O(lg n) BST lookup time is much worse by the O(1) dictionary time, and the cost of casting a float to and int is small enough to make this bound by data structure lookups/cache issues.

Looking up floats with some epsilon from a data structure in C#, with search and insert both O(lg n) time

2 Answers2