replacing "-01" with "-02" intel compiler option leads to throwing FPE at small for loop trip counts

Question

Here is the code:

#include <math.h>
#include <iostream>
#include <omp.h>
#include <fenv.h>
#include <signal.h>

void handler(int sig)
{
  printf("Floating Point Exception\n");
  exit(0);
}
const float alpha=1.5;
const unsigned int N=2;
struct Particle
{
  float x, y, z;
  float m;
};
Particle p[N] __attribute__((aligned(64)));
void interaction()
{
  double P=0.0;
#pragma omp parallel for reduction(+:P)
  for(int i=0; i<N; ++i)
  {
    float PP=0.0;
  #pragma simd reduction(+:PP)
  //#pragma novector
    for(int j=0; j<N; ++j) if(i!=j)
    {
      float rdist1=sqrtf((p[i].x-p[j].x)*(p[i].x-p[j].x)+(p[i].y-p[j].y)*(p[i].y-p[j].y)+(p[i].z-p[j].z)*(p[i].z-p[j].z));
      PP+=alpha/rdist1;
    }
    P+=PP;
  }
  std::cout<<"P="<<P<<std::endl;
}
void randomizeBodies()
{
  double pot_en=0.;
  const double pi=3.1415926536;   
  float RADIUS=pow(10.0*N,1.0/3.0);
#pragma omp single
#pragma novector
  for(int i=0; i<N; ++i)
  {
    float DISTANCE=0.0f;
    if(i>0)
    {
      while(DISTANCE<=1.0f)
      {
        float theta=pi*static_cast <float> (rand())/(static_cast <float> (RAND_MAX));
        float phi=2*pi*static_cast <float> (rand())/(static_cast <float> (RAND_MAX));
        float rr=RADIUS*static_cast <float> (rand())/(static_cast <float> (RAND_MAX));
        p[i].x =rr*sin(theta)*cos(phi);
        p[i].y =rr*sin(theta)*sin(phi);
        p[i].z =rr*cos(theta);
        DISTANCE=10000.0f;
      #pragma simd reduction(min:DISTANCE)      
        for(int j=0; j<i; ++j)
        {
          float dij=sqrt((p[i].x-p[j].x)*(p[i].x-p[j].x)+(p[i].y-p[j].y)*(p[i].y-p[j].y)+(p[i].z-p[j].z)*(p[i].z-p[j].z));
          if(dij<DISTANCE) DISTANCE=dij;
        }
      }
    }
    else
    {
      float theta=pi*static_cast <float> (rand())/(static_cast <float> (RAND_MAX));
      float phi=2*pi*static_cast <float> (rand())/(static_cast <float> (RAND_MAX));
      float rr=RADIUS*static_cast <float> (rand())/(static_cast <float> (RAND_MAX));
      p[i].x =rr*sin(theta)*cos(phi);
      p[i].y =rr*sin(theta)*sin(phi);
      p[i].z =rr*cos(theta);
    }  
  }
#pragma omp parallel for reduction(+:pot_en)
  for(int i=0; i<N; ++i)
  {
    float pp=0.0;
  #pragma simd reduction(+:pp)
  //#pragma novector
    for(int j=0; j<N;  ++j) if(i!=j)
    {
      float rd=sqrtf((p[i].x-p[j].x)*(p[i].x-p[j].x)+(p[i].y-p[j].y)*(p[i].y-p[j].y)+(p[i].z-p[j].z)*(p[i].z-p[j].z));
      pp+=alpha/rd;
    }
    pot_en+=pp;
  }
  pot_en/=2;
  std::cout<<"P="<<pot_en<<std::endl;
}
int main(int argc, char **argv)
{
  feenableexcept(FE_INVALID | FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW);
  signal(SIGFPE, handler);
  randomizeBodies();
  interaction();
}

I compile it using:

-DCMAKE_CXX_COMPILER=icpc -DCMAKE_CXX_FLAGS="-O2"

and with "-g" it gives the following output

Program received signal SIGFPE, Arithmetic exception.
0x00000000004024ab in randomizeBodies ()
at /home/70-gaa/source/GPU/ERROR24/error.cpp:90
90            pp+=alpha/rd;
(gdb)

at N=2,4. But at N>=8 it works. If to comment

 #pragma simd reduction

and uncomment

 #pragma novector

in both for loops, everything works at any N. If to use "-01", everything also works at any N.
If to to use vectorization and "-O2" or "-O3", the program throws a floating point exception at N=2,4, but at N>=8 works. Why? In principle, i need to use the following compile line: -DCMAKE_CXX_FLAGS="-march=native -mtune=native -ipo16 -fp-model fast=2 -O3 -qopt-report=5 -mcmodel=large". But "-O3" doesn't work. I work on Intel Core i7-3770 8-core CPU and use Intel compiler icpc version 17.0.1.

The general answer is the same as [adding "-march=native" intel compiler flag to the compilation line leads to a floating point exception on KNL](https://stackoverflow.com/q/52592058). ICC without without `-fp-model except` allows the compiler to make asm that causes FP exceptions the source wouldn't, so this kind of thing is what you should expect with `feenableexcept`. Are you only asking about what specific optimization / code-gen strategy causes it in this case? It would be helpful if you showed the asm instruction that raised the fault. — Peter Cordes, Oct 04 '18 at 10:48
Supposedly there is a vector division there, and as N<8, it executes a division by zero for the unused parts of the register (I suppose that AVX is used). — geza, Oct 04 '18 at 10:57
I can't use -fp-model except. Here is the compiler error output when i try to use "-O1 -fp-model except" compile line: icpc: command line error: -fp-model except conflicts with the default -fp-model fast option. How to override it? — And, Oct 04 '18 at 11:10
Is there a guide how to print the asm instruction that raised the fault while debugging with gdb? Or if You don't know assembler it isn't worth trying it? — And, Oct 04 '18 at 11:25
You want conflicting things. You enable FPU exceptions, yet you lie to the compiler that it is not enabled. Why do you want such a thing? — geza, Oct 04 '18 at 11:30
"except – Enforces floating point exception semantics." (Quick Reference Guide to Optimization Intel® C++ and Fortran Compilers v15). I thought that this option enables floating point exception handling by the compiler, isn't it? I try to compile with "-O0 -fp-model except" or just "-O0". In my full code i strongly need to use "-O2" or "-O3" optimization level option. I don't understand what are the conflicting things. — And, Oct 04 '18 at 11:42

score 0 · Answer 1 · answered Oct 04 '18 at 17:45

0

The general answer is the same as adding "-march=native" intel compiler flag to the compilation line leads to a floating point exception on KNL. ICC without -fp-model except allows the compiler to make asm that causes FP exceptions the source wouldn't, so this kind of thing is what you should expect with feenableexcept.

You haven't shown enough asm to give a specific answer as to exactly what optimization trick led to this in this specific case.

In my full code i strongly need to use "-O2" or "-O3" optimization level option. I don't understand what are the conflicting things.

So read the Intel doc I linked in my previous answer, which explains it clearly.

There are 3 "groups" of options (precise/fast/strict), (width), and (except or not). The default is fast=1 for that group.

User and Reference Guide for the Intel® C++ Compiler 15.0 fp-model, fp

Since fast is the default, you must not specify except without a group A or group B keyword.

You can safely use either of these:

-O3 -fp-model precise -fp-model except with feenableexcept()
-O3 -fp-model fast=2 with your fully-optimized release/production version that doesn't use feenableexcept().

You must use -fp-model except if you want to use feenableexcept(). Otherwise the compiler will assume that causing an FP exception is not a visible side-effect of your program.

answered Oct 04 '18 at 17:45

Peter Cordes

328,167
45
605
847

Thank You! "Since fast is the default, you must not specify except without a group A or group B keyword" - that's why i couldn't compile with the only option "except" yesterday, understood. Maybe it is a foolish question, but i don't understand why if to comment feenableexcept(), everything works at any N, and if to uncomment - throws fpe at N=2,4. I thought that if there is fpe, the program should crash or give an not a number result or something regardless of feenableexcept() is on or off. How does it work and give, it seemes, a right result when really there is a floating point exception? – And Oct 05 '18 at 07:30
"The floating-point exceptions are not related to the C++ exceptions. When a floating-point operation raises a floating-point exception, the status of the floating-point environment changes, which can be tested with std::fetestexcept, but the execution of a C++ program on most implementations continues uninterrupted"(https://en.cppreference.com/w/cpp/numeric/fenv). How the program can execute further when there is a floating point exception? What values will it work with (if there is an fpe, how the program defines the result of the operation causing fpe)? – And Oct 05 '18 at 07:36
And more: "GNU libc function feenableexcept() enables trapping of the floating-point exceptions, which generates the signal SIGFPE. If the compiler option -fnon-call-exceptions was used, the handler for that signal may throw a user-defined C++ exception" (en.cppreference.com/w/cpp/numeric/fenv). Undersood, if to enable feenableexcept(), the program will throw fp exceptions, if not - it will not. But how it works further when an fpe occurs (how the program defines the result of the operation causing fpe and goes further)? Please, explain. – And Oct 05 '18 at 07:46
@And: It gives the right result because the FP exception is in a temporary that isn't used in that case (e.g. try the fast way, fall back to the slow way on NaN), or in a SIMD vector element that's blended away and doesn't become part of the final result. The whole point of `-fp-model no-except` is to enable optimizations that have this side-effect. – Peter Cordes Oct 05 '18 at 09:57
What does it mean: "a SIMD vector element that's blended away and doesn't become part of the final result" and "FP exception is in a temporary that isn't used in that case"? Please, explain. You want to say that values for the cases when FPE trapping is enabled or not are kept in different places of the memory (or in different SIMD vector elements, as You said) and when throwing of fpe is disabled, the right values are taken which give the right result (and the elements for the case of fpe are simply thrown away)? I'm willing to understand, because yet i don't. – And Oct 05 '18 at 13:40
@And: I mean things like [What is meant by "fixing up" floats?](https://stackoverflow.com/q/30213615). Or where positive vs. negative numbers need different handling, but you do both ways for a whole vector and then blend the results together. So you can generate FP exceptions from the computations along the "wrong" path as part of branchlessly computing both ways. – Peter Cordes Oct 05 '18 at 14:59
Tnank You very much, Peter Cordes! – And Oct 08 '18 at 11:02

replacing "-01" with "-02" intel compiler option leads to throwing FPE at small for loop trip counts

1 Answers1