35

The following code is supposed to find the key 3.0in a std::map which exists. But due to floating point precision it won't be found.

map<double, double> mymap;
mymap[3.0] = 1.0;

double t = 0.0;
for(int i = 0; i < 31; i++)
{
  t += 0.1;
  bool contains = (mymap.count(t) > 0);
}

In the above example, contains will always be false. My current workaround is just multiply t by 0.1 instead of adding 0.1, like this:

for(int i = 0; i < 31; i++)
{
  t = 0.1 * i;
  bool contains = (mymap.count(t) > 0);
}

Now the question:

Is there a way to introduce a fuzzyCompare to the std::map if I use double keys? The common solution for floating point number comparison is usually something like a-b < epsilon. But I don't see a straightforward way to do this with std::map. Do I really have to encapsulate the double type in a class and overwrite operator<(...) to implement this functionality?

pokey909
  • 1,797
  • 1
  • 16
  • 22
  • 4
    How close is close enough? It sounds like you might actually want to store and look up via rounded keys. – Cascabel Jul 13 '11 at 19:43
  • 2
    In a sort of reverse to your workaround, and if the floats are all of a specified digit precision, you could instead store the keys as integers, the floating-point values being multiplied by some scale factor and stored like that. – JAB Jul 13 '11 at 20:06
  • I thought about that, but I will run into trouble depending on the resolution I might overflow easily. – pokey909 Jul 13 '11 at 20:20
  • 1
    You should be aware that even your workaround could fail - you got lucky on the rounding for the multiply. `0.1` cannot be represented precisely in base 2. – Mark Ransom Jul 13 '11 at 20:31
  • yes I know. That's why I wanted a real solution. But you must admit, it's a nice example to show off how float comparisons can lead to undefined behaviour. :-) – pokey909 Jul 13 '11 at 21:24
  • 2
    See http://stackoverflow.com/questions/4816156/are-ieee-floats-valid-key-types-for-stdmap-and-stdset – Robᵩ Jul 15 '11 at 17:20
  • Your code is working correctly. When you add the `double` `0.1` to itself ten times, you do not get `1.0`. – tmyklebu May 12 '14 at 14:40
  • In your exact case you should probably used "fixed point" keys. double can usefully be used as a key in a map when you plan to use this for interpolation or making a histogram. For either of these you are going to use lower_bound to define which range a value falls in and how far between the boundaries it falls. – CashCow Oct 13 '16 at 14:13

5 Answers5

31

So there are a few issues with using doubles as keys in a std::map.

First, NaN, which compares less than itself is a problem. If there is any chance of NaN being inserted, use this:

struct safe_double_less {
  bool operator()(double left, double right) const {
    bool leftNaN = std::isnan(left);
    bool rightNaN = std::isnan(right);
    if (leftNaN != rightNaN)
      return leftNaN<rightNaN;
    return left<right;
  }
};

but that may be overly paranoid. Do not, I repeat do not, include an epsilon threshold in your comparison operator you pass to a std::set or the like: this will violate the ordering requirements of the container, and result in unpredictable undefined behavior.

(I placed NaN as greater than all doubles, including +inf, in my ordering, for no good reason. Less than all doubles would also work).

So either use the default operator<, or the above safe_double_less, or something similar.

Next, I would advise using a std::multimap or std::multiset, because you should be expecting multiple values for each lookup. You might as well make content management an everyday thing, instead of a corner case, to increase the test coverage of your code. (I would rarely recommend these containers) Plus this blocks operator[], which is not advised to be used when you are using floating point keys.

The point where you want to use an epsilon is when you query the container. Instead of using the direct interface, create a helper function like this:

// works on both `const` and non-`const` associative containers:
template<class Container>
auto my_equal_range( Container&& container, double target, double epsilon = 0.00001 )
-> decltype( container.equal_range(target) )
{
  auto lower = container.lower_bound( target-epsilon );
  auto upper = container.upper_bound( target+epsilon );
  return std::make_pair(lower, upper);
}

which works on both std::map and std::set (and multi versions).

(In a more modern code base, I'd expect a range<?> object that is a better thing to return from an equal_range function. But for now, I'll make it compatible with equal_range).

This finds a range of things whose keys are "sufficiently close" to the one you are asking for, while the container maintains its ordering guarantees internally and doesn't execute undefined behavior.

To test for existence of a key, do this:

template<typename Container>
bool key_exists( Container const& container, double target, double epsilon = 0.00001 ) {
  auto range = my_equal_range(container, target, epsilon);
  return range.first != range.second;
}

and if you want to delete/replace entries, you should deal with the possibility that there might be more than one entry hit.

The shorter answer is "don't use floating point values as keys for std::set and std::map", because it is a bit of a hassle.

If you do use floating point keys for std::set or std::map, almost certainly never do a .find or a [] on them, as that is highly highly likely to be a source of bugs. You can use it for an automatically sorted collection of stuff, so long as exact order doesn't matter (ie, that one particular 1.0 is ahead or behind or exactly on the same spot as another 1.0). Even then, I'd go with a multimap/multiset, as relying on collisions or lack thereof is not something I'd rely upon.

Reasoning about the exact value of IEEE floating point values is difficult, and fragility of code relying on it is common.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • Why is my_equal_range and key_exists using move semantics? it should instead be `const Container& container` . not only is moving unnecessary, it doesn't work either since after calling my_equal_range/key_exists, the moved container will be empty after the call. – JDiMatteo Oct 28 '14 at 18:30
  • 1
    @JDiMatteo Those are universal references, not rvalue references (move semantics). `my_equal_range` may want a non-`const` `Container` in order to return non-`const_iterator`s. By using the above technique, it works for both. `key_exists` could use `Container const&` admittedly. Finally, remember that `move` does not move, and neither do `&&` references -- moving requires constructing another object. – Yakk - Adam Nevraumont Oct 28 '14 at 18:33
  • thanks for the correction, I had never heard of universal references before, and I'm just reading up on it now. "move does not move" -- I guess I don't know as much C++11 as I thought. – JDiMatteo Oct 28 '14 at 18:35
  • key_exists implementation is wrong. e.g. if set is {1, 2, 3}, and target is 100 with epsilon 0.00001, lower is end and upper is end. I think it should be should be `return range.first != range.second;` e.g. see some test cases here: http://ideone.com/oc0PhH . (I tried to edit the answer but the edit required at least a 6 char change which seems silly for code.) – JDiMatteo Oct 28 '14 at 21:03
  • Thanks for correcting that -- this working example is really handy. – JDiMatteo Oct 28 '14 at 21:56
  • I agree with your answer other than the "do not use floating point values in a map" because assuming you want a sorted associative container based on this, you might want to do so. The issue is more what you are using it for, and whether you are looking for exact values or some kind of ranging e.g. histogram or points to interpolate between. – CashCow Oct 13 '16 at 14:11
  • @CashCow True. "Do not look up floating point values in a map" might be a better wording. `.find` and `[]` and the like are likely to be sources of bugs. Added a paragraph or three on the subject. – Yakk - Adam Nevraumont Oct 13 '16 at 14:32
  • I'm curious if using a struct of two or more floats as the POD key is as problematic as using a single float in a map? if the struct has an operator< implemented with std::tie, then that should handle the ordering automatically with stable guarantees? – johnbakers Jan 04 '17 at 10:03
  • @johnbakers Yes, and no (NaN). – Yakk - Adam Nevraumont Aug 11 '22 at 14:12
7

Here's a simplified example of how using soft-compare (aka epsilon or almost equal) can lead to problems.

Let epsilon = 2 for simplicity. Put 1 and 4 into your map. It now might look like this:

1
 \
  4

So 1 is the tree root.

Now put in the numbers 2, 3, 4 in that order. Each will replace the root, because it compares equal to it. So then you have

4
 \
  4

which is already broken. (Assume no attempt to rebalance the tree is made.) We can keep going with 5, 6, 7:

7
 \
  4

and this is even more broken, because now if we ask whether 4 is in there, it will say "no", and if we ask for an iterator for values less than 7, it won't include 4.

Though I must say that I've used maps based on this flawed fuzzy compare operator numerous times in the past, and whenever I digged up a bug, it was never due to this. This is because datasets in my application areas never actually amount to stress-testing this problem.

Evgeni Sergeev
  • 22,495
  • 17
  • 107
  • 124
  • 2
    If you want to see a real-world case where comparison-using-epsilon wrecks everything, try using a `set` or `map` in something like Fortune's algorithm for Voronoi diagrams. There, you're *only* looking at a location in the tree if the two keys are about to become equal. – tmyklebu May 12 '14 at 14:37
  • @tmyklebu Considering points in 2D or N-D space is opening up another can of worms on top, because then @Yakk's approach can't be extended in a straightforward way (consider many points on an axis-aligned line with a bit of noise: two almost nearby points might be located far away in the order). In that case I would go with putting a grid on the space and converting to `long long` coordinates on that grid. Making sure that the domain of the problem is within, but also of the order of the range of `long long`, for best results. – Evgeni Sergeev May 12 '14 at 14:47
  • 1
    @EvgeniSergeev The right way to do N dimensional spacial lookup is with Quad/Oct/K-D/R-trees or similar systems. You do a query for all points within `epsilon` distance (either Manhattan or Euclidean) of your target when looking if something is at a location. – Yakk - Adam Nevraumont Aug 08 '14 at 20:43
  • @Yakk Yes, but that adds a lot of overhead. Sometimes a simple integer grid will suffice. Especially when you're certain not to have multiple points within an epsilon of each other. This allows us to use the already-available `map`, as in `map, object>`. For the idea to work, the element should be placed into each cell that overlaps a circle with epsilon radius (or a square, for Manhattan distance) around its actual location. So it makes sense to have a grid pitch around, or less than, an epsilon. – Evgeni Sergeev Aug 09 '14 at 04:07
  • The purpose of an epsilon is to be negligible with respect to your resolution. Here, it is twice your resolution. – Korchkidu Oct 09 '15 at 08:57
  • @Korchkidu Are you implying a provably correct algorithm for dealing with arbitrary floats based on some concept of "resolution"? If so, explain it more fully! – Evgeni Sergeev Oct 21 '15 at 13:44
  • @EvgeniSergeev: no, I am just saying that the purpose of an epsilon is to be negligible with respect to the smallest floats you need to work with. – Korchkidu Oct 22 '15 at 12:25
  • @Korchkidu You're right, but do you see how the logic of my post carries over to that situation? It's just easier to see what's going on when we zoom in on the problem this way. – Evgeni Sergeev Oct 23 '15 at 12:14
  • I probably did not fully understand this... I tried this with std::map and eps 2: first I added 1 and 4, then 2, 3, 4, ... . When printing the first key of the map, it always shows 1, not 7. Hence, I fail to see how the map is broken. – mfnx Oct 17 '20 at 23:00
3

As Naszta says, you can implement your own comparison function. What he leaves out is the key to making it work - you must make sure that the function always returns false for any values that are within your tolerance for equivalence.

return (abs(left - right) > epsilon) && (left < right);

Edit: as pointed out in many comments to this answer and others, there is a possibility for this to turn out badly if the values you feed it are arbitrarily distributed, because you can't guarantee that !(a<b) and !(b<c) results in !(a<c). This would not be a problem in the question as asked, because the numbers in question are clustered around 0.1 increments; as long as your epsilon is large enough to account for all possible rounding errors but is less than 0.05, it will be reliable. It is vitally important that the keys to the map are never closer than 2*epsilon apart.

Community
  • 1
  • 1
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • 1
    Consdier `a`, `b=a+3ε/4`, `c=a+3ε/2`. We have `(a – Robᵩ Jul 15 '11 at 17:17
  • @Rob, yes it does. You need some out-of-band way of ensuring that the values stay out of the problem area between 1ε and 2ε apart from each other. This condition was satisfied in the original problem as defined. – Mark Ransom Jul 15 '11 at 17:24
  • 1
    -1: See an example in my answer that exposes the flaw with this idea. For many practical purposes this will work, but it's not good to encourage the spread of an idea that has a flaw, when reasonable alternatives are available. – Evgeni Sergeev May 12 '14 at 14:34
  • @EvgeniSergeev the flaw only presents itself when the keys are arbitrarily close together compared to the epsilon value, as shown by your own answer. I've edited the answer to identify the conditions that are necessary for reliable application of this method. – Mark Ransom Jan 04 '17 at 23:59
1

You could implement own compare function.

#include <functional>

class own_double_less : public std::binary_function<double,double,bool>
{
public:
  own_double_less( double arg_ = 1e-7 ) : epsilon(arg_) {}
  bool operator()( const double &left, const double &right  ) const
  {
    // you can choose other way to make decision
    // (The original version is: return left < right;) 
    return (abs(left - right) > epsilon) && (left < right);
  }
  double epsilon;
};
// your map:
map<double,double,own_double_less> mymap;

Updated: see Item 40 in Effective STL! Updated based on suggestions.

codeling
  • 11,056
  • 4
  • 42
  • 71
Naszta
  • 7,560
  • 2
  • 33
  • 49
  • @Andre Holzner: I know. I just wanted to show the base algorthm. I think he could change it free. Update: I modified the code. – Naszta Jul 13 '11 at 19:55
  • 13
    This version can result in undefined behavior because it gives inconsistent results depending on the order of the parameters. If a == b, own_double_less(a,b) returns true but so does own_double_less(b,a). They can't both be less than each other! – Mark Ransom Jul 13 '11 at 20:09
  • 6
    If you go this route, keep in mind that the comparator has to create a strict weak ordering. `comp(a, b) && comp(b, c)` implies `comp(a, c)`, `comp(a, a) == false`, and `comp(a, b)` implies `!comp(b, a)`. This can be easy to muck up... for instance `own_double_less(-1, -1) == true` if `epsilon > 0` – Dennis Zickefoose Jul 13 '11 at 20:10
  • @Dennis, thanks for coming up with the more complete set of comparator rules than I did. I think you can get away with breaking your first rule (the transitive property) if you can ensure that the values a,b,c will never attain values that break the function. – Mark Ransom Jul 13 '11 at 20:25
  • My first comment can be ignored, the edit borrows the correction from my own answer. – Mark Ransom Jul 13 '11 at 20:36
  • Thanks for your great help Naszta, Mark and everyone else. I tested this one and it works as expected. – pokey909 Jul 13 '11 at 21:23
  • 6
    I agree with the other commenters. This solution is wrong in the general case because a == b && b == c => a == c will not always hold and your map will get screwed up. – jcoffland Jan 24 '12 at 09:26
  • Say for argument sake your epsilon is 0.1 and a=1.06, b=1 and c=0.94. Then a == b and b == c but a != c. std::map does not like this. If you can be sure that all your inputs are normally sufficiently spread apart except for floating point errors then it will work otherwise not. – jcoffland Jan 24 '12 at 09:31
  • 6
    Do not use this answer it is broken. – Yakk - Adam Nevraumont Nov 12 '12 at 21:21
  • 9
    To be clear, using the above code in a `std::map` can easily lead to the `std::map` performing undefined behavior (including, in practice, crashing, looping infinitely, or blowing the stack). @jcoffland gives an explicit example of how this can happen. Both this answer, and @MarkRansom's answer, are reckless things to do to a `std::map`. – Yakk - Adam Nevraumont Nov 13 '12 at 20:52
  • 4
    -1: See an example in my answer that exposes the flaw. For many practical purposes this will work, but it's not good to encourage the spread of an idea that has a flaw, when reasonable alternatives are available. – Evgeni Sergeev May 12 '14 at 14:33
  • 3
    @Naszta: If you've "tested this to work," you haven't tested it enough. – tmyklebu May 12 '14 at 14:34
-2

Using doubles as keys is not useful. As soon as you make any arithmetic on the keys you are not sure what exact values they have and hence cannot use them for indexing the map. The only sensible usage would be that the keys are constant.

Jiri Kriz
  • 9,192
  • 3
  • 29
  • 36
  • 1
    I disagree here. Using doubles as keys for intervals is infact very useful. Especially the weak ordering property lets you find boundaries quickly. If you are aware of the imprecisions you can define a tolerance as demonstrated above. – pokey909 Jul 13 '11 at 21:29
  • 2
    I think it will be very difficult to specify a proper epsilon as is always the case with real (floating) numbers. If eps is too large, close numbers cannot be inserted because they are considered equal. If it is too small, a slightly modified key cannot be retrieved because it is outside the eps-interval. The [interval arithmetic](http://en.wikipedia.org/wiki/Interval_arithmetic) is a difficult topic. – Jiri Kriz Jul 13 '11 at 21:55
  • true, but then again you run into this kind of trouble whenever you are dealing with floating points. But in my case I can specify a reasonable epsilon. Fortunatly I don't want to implement an arbitrary precision interval container :-) As you noted, things will get more complex then. – pokey909 Jul 13 '11 at 22:18