6

I have a std::vector<double> that may contain several NAN values. I want to find the largest element in the vector. How can I efficiently skip the NANs in the comparison? I'd like to avoid having to call isnan on each element. any ideas?

// std::max_element([NAN,NAN,NAN,-31,-89]) = NAN 
// because NAN > -31 returns NAN.
// how can I skip all NANs in the comparison?
// test 2 below is my use case.

#include <vector>
#include <iostream>
#include <cmath>

void vector_max(std::vector<double> v, double &max, int &imax){
    std::vector<double>::iterator v_iter;
    v_iter = std::max_element(v.begin(),v.end());
    imax = std::distance(v.begin(), v_iter);
    max  = *v_iter;
}

int main(){

    std::vector<double> v_vec;
    std::vector<double>::iterator v_vec_iter;
    int imax;
    double val;

    std::cout << "test 1. " << std::endl;

    v_vec.push_back( -33.0 );
    v_vec.push_back( -124.0 );
    v_vec.push_back( -31.0 );
    v_vec.push_back( 18.4 );

    vector_max(v_vec,val,imax);
    std::cout << "max(v_vec) = " << val << std::endl;
    std::cout << "indmax(v_vec) = " << imax << std::endl;

    std::cout << "test 2: my case. " << std::endl;

    v_vec.clear();
    v_vec.push_back( NAN );
    v_vec.push_back( NAN );
    v_vec.push_back( NAN );
    v_vec.push_back( -33.0 );
    v_vec.push_back( -124.0 );
    v_vec.push_back( -31.0 );
    v_vec.push_back( 31.0 );

    vector_max(v_vec,val,imax);
    std::cout << "max(v_vec) = " << val << std::endl;
    std::cout << "indmax(v_vec) = " << imax << std::endl;

};

this returns:

test 1. 
max(v_vec) = 18.4
indmax(v_vec) = 3
test 2. 
max(v_vec) = nan
indmax(v_vec) = 0
Florian Oswald
  • 5,054
  • 5
  • 30
  • 38
  • 3
    Something doesn't make sense here; all comparisons with NaN return `false`, so NaN can _never_ be greater (or less, or equal, etc.) than anything else. Unless you're compiling with `-ffast-math`, then NaNs are treated like 0, meaning NaN is in fact > −31. – mindriot Mar 09 '16 at 10:58
  • 3
    [filter iterator](http://www.boost.org/doc/libs/1_56_0/libs/iterator/doc/filter_iterator.html) may help (even if you call `isnan` on each element). – Jarod42 Mar 09 '16 at 11:00
  • @mindriot I wish you were right. if you care to run that piece of code, you will see that it returns what I just added at the bottom of the question. – Florian Oswald Mar 09 '16 at 11:05
  • 1
    @mindriot: Indeed. The problem here is that if !a – MSalters Mar 09 '16 at 11:15
  • @mindriot Look at that from another angle: NaN is never less than anything and because of that max_element might return NaN. Or might not, because it is never greater than anything. Presence of NaN in sequence makes operator< not conform to strict weak ordering criteria required by all standard algorithm. – Revolver_Ocelot Mar 09 '16 at 11:16
  • @mindriot: that is only true for equality. – Karoly Horvath Mar 09 '16 at 11:17
  • @Revolver_Ocelot yep, see my answer. – mindriot Mar 09 '16 at 11:20
  • Should have been more clear: My first comment was mostly referring to the `// because NAN > -31 returns NAN.` in the source code. – mindriot Mar 09 '16 at 11:21
  • Think of NaN as an error indicator. When you have NaNs in your results something went wrong. Continuing the calculation is almost certainly the wrong thing to do when that happens. Figure out why you're getting NaNs in your results and fix the problem. – Pete Becker Mar 09 '16 at 14:26

3 Answers3

10

You can provide a custom comparison for max_element:

void vector_max(std::vector<double> v, double &max, int &imax){
    std::vector<double>::iterator v_iter;
    v_iter = std::max_element(v.begin(),v.end(),
    [] (auto x, auto y)
    {
        return x < y ? true : isnan(x);
    });
    imax = std::distance(v.begin(), v_iter);
    max  = *v_iter;
}
tahsmith
  • 1,643
  • 1
  • 17
  • 23
  • It is a working answer, however OP stated that he wants to _"avoid having to call isnan on each element"_ – Revolver_Ocelot Mar 09 '16 at 11:17
  • @Revolver_Ocelot So, this doesn't call isnan on each element, but probably most. – tahsmith Mar 09 '16 at 11:32
  • hey @tahsmith this is very neat. is there a way to do this without relying on C++11 features? I can't rely on those unfortunately. – Florian Oswald Mar 09 '16 at 12:37
  • @FlorianOswald, you could implement the comparison in a separate class, as per `mindriot`'s answer, but that makes it substantially less neat in my opinion – tahsmith Mar 09 '16 at 22:36
  • 1
    replace `x < y ? true : isnan(x)` with `not( x >= y )`. nan must always evaluate false, so for no NaN contained you basically get x=y evaluates false, therefore not(x>=y) returns true on NaN – mxmlnkn Jul 20 '17 at 17:01
4

I would try something like this:

void vector_max(std::vector<double> v, double &max, int &imax){
    std::vector<double>::size_type p=0;
    imax = -1;
    max = std::numeric_limits<double>::lowest();

    for (auto &val : v)
    {
        if (!std::isnan(val) && val>max)
        {
            imax = p;
            max = val;
        }
        p++;
    }
}
Florian Oswald
  • 5,054
  • 5
  • 30
  • 38
Juan Leni
  • 6,982
  • 5
  • 55
  • 87
4

The problem is that std::max_element uses std::less as its comparator by default. Depending on the order in which it processes the vector's elements, a NAN may appear on the right-hand side of the comparison. Since all comparisons with NANs return false, this means that NAN can appear greater than all other elements.

Put differently, when you use std::max_element with the default comparator on a vector with NANs, the result is actually undefined since it depends on the implementation, and on the order of elements. For example, on GCC, if I place all your NANs at the end of the vector, I (randomly) get the desired result.

So, you have no other option than to provide your own comparison operator:

#include <vector>
#include <iostream>
#include <cmath>
#include <algorithm>

template <typename T>
struct NaNAwareLess
{
  bool operator () (T a, T b) const
  {
    if (std::isnan(b))
    {
      return false; // Assume NaN is less than *any* non-NaN value.
    }
    if (std::isnan(a))
    {
      return true; // Assume *any* non-NaN value is greater than NaN.
    }
    return (a < b);
  }
};

void vector_max(std::vector<double> v, double &max, int &imax){
    std::vector<double>::iterator v_iter;
    v_iter = std::max_element<std::vector<double>::iterator, NaNAwareLess<double> >(v.begin(),v.end(),NaNAwareLess<double>());
    imax = std::distance(v.begin(), v_iter);
    max  = *v_iter;
}

int main(){

    std::vector<double> v_vec;
    std::vector<double>::iterator v_vec_iter;
    int imax;
    double val;

    std::cout << "test 1. " << std::endl;

    v_vec.push_back( -33.0 );
    v_vec.push_back( -124.0 );
    v_vec.push_back( -31.0 );
    v_vec.push_back( 18.4 );

    vector_max(v_vec,val,imax);
    std::cout << "max(v_vec) = " << val << std::endl;
    std::cout << "indmax(v_vec) = " << imax << std::endl;

    std::cout << "test 2: my case. " << std::endl;

    v_vec.clear();
    v_vec.push_back( NAN );
    v_vec.push_back( NAN );
    v_vec.push_back( NAN );
    v_vec.push_back( -33.0 );
    v_vec.push_back( -124.0 );
    v_vec.push_back( -31.0 );
    v_vec.push_back( 31.0 );

    vector_max(v_vec,val,imax);
    std::cout << "max(v_vec) = " << val << std::endl;
    std::cout << "indmax(v_vec) = " << imax << std::endl;
    std::cout << std::boolalpha << std::less<double>()(NAN, -33.0) << std::endl;
    std::cout << std::boolalpha << std::less<double>()(-33.0, NAN) << std::endl;
};

I don't think you can avoid calling isnan. And there's another important aspect as well: From personal experience, I found performing operations on NAN values to be a lot slower than on any other values (probably because of FPU exception handling). So, while using isnan might be annoying, it could also make quite a positive difference in performance.

mindriot
  • 5,413
  • 1
  • 25
  • 34
  • I see, that sounds good! what's the result of comparing NAN with NAN? – Florian Oswald Mar 09 '16 at 11:23
  • The result of comparing `NAN` to `NAN` is also `false`. So the operator I used here keeps it that way. – mindriot Mar 09 '16 at 11:25
  • Oh, and by the way: The more “C++” way of getting a `NAN` is using [`std::numeric_limits::quiet_NaN`](http://en.cppreference.com/w/cpp/types/numeric_limits/quiet_NaN). – mindriot Mar 09 '16 at 11:27
  • hey @mindriot, so you are saying i should use `std::numeric_limits::quiet_NaN` instead of `cmath` NAN? is it easy to explain why? thanks. – Florian Oswald Mar 09 '16 at 12:39
  • @FlorianOswald You are not _forced_ to use it at all; but there are some reasons why I prefer the `numeric_limits` variant: (1) it gives you appropriate constants for _all_ floating-point types in a clean manner, where as `cmath`'s macro constants are for specific types only. It doesn't matter so much for `NAN`, but look at `HUGE_VAL` vs. `HUGE_VALF` vs. `HUGE_VALL`, for example. (2) The C++ approach from `` _might_ be more portable (I think). … – mindriot Mar 09 '16 at 12:50
  • @FlorianOswald Some aspects that are more of a personal taste: (3) the `limits` variant “looks more like C++”, and [macros are somewhat frowned upon in C++](http://stackoverflow.com/questions/17043090/why-should-i-avoid-macros-in-c). (4) You would use `` rather than `` when developing in C++, so to me it makes sense to use the C++ constant from `` rather than the “old” one from ``. – mindriot Mar 09 '16 at 12:52
  • `binaray_function` is deprecated since C++11. – edmz Mar 09 '16 at 13:28
  • @black Oh, thanks, I wasn't aware of that. Will edit it out. – mindriot Mar 09 '16 at 13:36