3

I'm using Eigen extensively in a scientific application I've been developing for some time. Since I'm implementing a numerical method, a number below a certain threshold (e.g. 1e-15) is not a point of interest, and slowing down calculations, and increasing error rate.

Hence, I want to round off numbers below that threshold to 0. I can do it with a for loop, but hammering multiple relatively big matrices (2M cells and up per matrix) with a for-if loop is expensive and slowing me down since I need to do it multiple times.

Is there a more efficient way to do this with Eigen library?

In other words, I'm trying to eliminate numbers below a certain threshold in my calculation pipeline.

bayindirh
  • 425
  • 6
  • 21

2 Answers2

4

The shortest way to write what you want, is

void foo(Eigen::VectorXf& inout, float threshold)
{
    inout = (threshold < inout.array().abs()).select(inout, 0.0f);
}

However, neither comparisons nor the select method get vectorized by Eigen (as of now).

If speed is essential, you need to either write some manual SIMD code, or write a custom functor which supports a packet method (this uses internal functionality of Eigen, so it is not guaranteed to be stable!):

template<typename Scalar> struct threshold_op {
  Scalar threshold;
  threshold_op(const Scalar& value) : threshold(value) {}
  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const{
    return threshold < std::abs(a) ? a : Scalar(0); }
  template<typename Packet>
  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const {
    using namespace Eigen::internal;
    return pand(pcmp_lt(pset1<Packet>(threshold),pabs(a)), a);
  }
};
namespace Eigen { namespace internal {

template<typename Scalar>
struct functor_traits<threshold_op<Scalar> >
{ enum {
    Cost = 3*NumTraits<Scalar>::AddCost,
    PacketAccess = packet_traits<Scalar>::HasAbs };
};

}}

This can then be passed to unaryExpr:

inout = inout.unaryExpr(threshold_op<float>(threshold));

Godbolt-Demo (should work with SSE/AVX/AVX512/NEON/...): https://godbolt.org/z/bslATI


It might actually be that the only reason for your slowdown are subnormal numbers. In that case, a simple

_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);

should do the trick (cf: Why does changing 0.1f to 0 slow down performance by 10x?)

chtz
  • 17,329
  • 4
  • 26
  • 56
  • Thanks for the tips and the information. I'm aware of the slowdown subnormal numbers are creating, that's also partly why I'm explicitly avoiding numbers smaller than 1e-15. Because it's bordering in subnormal territory and they are not useful for my calculations. – bayindirh Feb 03 '19 at 18:19
  • 1
    What I meant to say is that you can probably save the effort of manually flushing values to zero and just rely on the automatic functionality of your CPU. – chtz Feb 03 '19 at 18:29
  • 1
    Also thanks for that, it's greatly appreciated. Sorry if my comment sounded kinda terse. English is not my native language. I was trying to say "You're right, I'm also trying to prevent subnormal floats, that's spot on!". Actually, I was looking for ways to enable it in a cross-platform way right now. – bayindirh Feb 03 '19 at 18:33
  • 2
    Looking at your godbolt-demo, the `.select()` version is automatically vectorized by GCC, but not clang. – ggael Feb 03 '19 at 21:51
1

Eigen has a method called UnaryExpr which applies a given function pointer to every coefficient in a matrix (it has sparse and array variants too).

Will test its performance and update this answer accordingly.

bayindirh
  • 425
  • 6
  • 21