1

I have an exponential filter, written in c, that is used to filter an output value towards an input value, given a time-base and time-constant.

void Exp_Filt(float In, float *Out, float time_base, float time_constant) {
  *Out = In + ((*Out - In) * expf(-time_base / time_constant));
}

My problem is, if In==0, Out!=0, and time_base is significantly less than time_constant (both of which are >0), then Out will never reach the value of 0.0 no matter how many times Exp_Filt is called.

For example, the following loop will run forever:

for (float Out = 1.0f; Out != 0.0f;)
{
  Exp_Filt(0.0f, &Out, 0.1f, 1.0f);
}

Is there any way to account for the precision round-off error of IEEE 754 32bit floating-point that is keeping Out from reaching 0 without needing a multitude of conditional statements?

Adam
  • 562
  • 2
  • 15
  • [How dangerous is it to compare floating point values?](https://stackoverflow.com/questions/10334688/how-dangerous-is-it-to-compare-floating-point-values/10335601) – tshiono Aug 09 '21 at 23:03
  • Filters are generally used to model or control physical phenomena. At some point, the difference between `Out` and its target `In` becomes physically insignificant—a sound difference is below human ability to distinguish, a radio signal is lost in the noise, a machine cannot be controlled so precisely. Generally, `Out != 0.0f` can be replaced by `fabsf(Out-In) < threshold`. But what is the context in which this appears? If the goal were simply to achieve the final state of `for (float Out = 1.0f; Out != 0.0f;){ Exp_Filt(0.0f, &Out, 0.1f, 1.0f); }`, then it is easily done with `Out = In`. – Eric Postpischil Aug 09 '21 at 23:12
  • After some iterations you are multiplying `pow(2,min_exp)` by a number between 1 and 0.5, which gets rounded up to `pow(2,min_exp)` (under most rounding behaviors). – chtz Aug 09 '21 at 23:20
  • @chtz: That is fairly apparent; the question is what to do about it. Also, “`min_exp`” is commonly regarded to be the exponent for the minimum normal value, as in the C Standard’s `FLT_MIN_EXP` (modulo scaling of the significand). The value in question is the minimum positive value, 2^(1-p+minimum exponent), where p is the precision (number of bits in the significand). – Eric Postpischil Aug 09 '21 at 23:24

2 Answers2

0

Is there any way to account for the precision round-off error of IEEE 754 32bit floating-point that is keeping Out from reaching 0 without needing a multitude of conditional statements?

expf(-0.1f / 1.0f) is 0.904837..., so any multiplication of FLT_TRUE_MIN will, with round-to-nearest, rounds to FLT_TRUE_MIN, so we are stuck in a loop.

An alternative to is to use Monte Carlo rounding

void Exp_Filt_MC(float In, float *Out, float time_base, float time_constant) {
  if (rand()&1) {
    fesetround(FE_UPWARD);
  } else {
    fesetround(FE_DOWNWARD);
  }
  *Out = In + ((*Out - In) * expf(-time_base / time_constant));

  // Note 
  // Better could would restore the rounding mode of function entry.
}

This will eventually exit the loop as sometimes FLT_TRUE_MIN*0.904837 will be 0.0.

There are variations on a theme possible here or limiting to only do this when (*Out - In) is sub-normal.

Yet with "without needing a multitude of conditional statements?" this seems simplest.


I also like the idea that if *Out, at the end of the function, is a sub-normal, just return 0.0f.

This is akin to

for (float Out = 1.0f; fabsf(Out) <= FLT_MIN; )
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Subnormals and `Out!=0` are red herrings. The title asks about going to a final value, and the same problem occurs with other values: `Out` gets to 1 ULP from `In`, multiplying that by .9 (or other values in (.5, 1)) yields more than ½ the ULP of `In`, and adding that to `In` yields `Out`. The loop should have been written more generically, as in `float In = TargetValue; for (float Out = StartValue; Out != In;) Exp_Filt(In, &Out, 0.1f, 1.0f);` – Eric Postpischil Aug 10 '21 at 02:05
  • Using `fesetround` inside a loop (or repetitively called function) is likely to have horrible performance. On many processors, changing the floating-point rounding mode blocks the instruction queue for a while. – Eric Postpischil Aug 10 '21 at 02:06
  • @EricPostpischilq Agree about changing rounding mode has time impact, yet code is calling `expf()`. I suspect OP's time concern is not the drving issue here. – chux - Reinstate Monica Aug 10 '21 at 03:44
  • @chux In general, I like your answer, but for performance, I would simplify it to only calculate if `(*Out - In) >= FLT_MIN`, otherwise set `*Out = In;`. – Adam Aug 10 '21 at 15:22
  • @Adam I would think you want _absolute value_ or `fabsf(*Out - In) >= FLT_MIN`. Also, be careful about the edge case. I'd recommend `fabsf(*Out - In) > FLT_MIN`. – chux - Reinstate Monica Aug 10 '21 at 15:27
-3

Because you're losing precision, you should do intermediate calculations in higher precision (e.g. double) and only convert back at the end.

So, you're better off doing:

void
Exp_Filt(double In, float *Out, double time_base, double time_constant)
{
    *Out = In + ((*Out - In) * exp(-time_base / time_constant));
}

Also, you may never get "perfect" convergence to 0.0. So your loop might be better off with:

#define ERRVAL 1e-5 // some smallish value ...
for (float Out = 1.0f; Out > ERRVAL;)

UPDATE:

Setting the rounding mode via fesetround seems to have an effect -- see below. In particular, doing fesetround(FE_DOWNWARD) [once] does get convergence to zero.

Also, I get slightly different results depending upon what -std= option is given.

I created a test program:

/* all.c */

#include <stdio.h>
#include <math.h>
#include <fenv.h>

#ifndef ITER
#define ITER    1000000
#endif

void
Exp_Filt_orig(float In, float *Out, float time_base, float time_constant)
{
    *Out = In + ((*Out - In) * expf(-time_base / time_constant));
}

void
Exp_Filt_fix1(double In, float *Out, double time_base, double time_constant)
{
    *Out = In + ((*Out - In) * exp(-time_base / time_constant));
}

double
Exp_Filt_fix2(double In, double Out, double time_base, double time_constant)
{
    return In + ((Out - In) * exp(-time_base / time_constant));
}

#define DOLOOP(_fnc,_lim) \
    do { \
        float Out; \
        iter = 0; \
        for (Out = 1.0f;  (_lim) && (iter < ITER); ++iter) \
            _fnc(0.0f, &Out, 0.1f, 1.0f); \
        printf("fnc=%s lim='%s' iter=%d Out=%.9g\n", \
            #_fnc,#_lim,iter,(double) Out); \
    } while (0)

#define DOLOOP2(_fnc,_lim) \
    do { \
        double Out; \
        iter = 0; \
        for (Out = 1.0f;  (_lim) && (iter < ITER); ++iter) \
            Out = _fnc(0.0f, Out, 0.1f, 1.0f); \
        printf("fnc=%s lim='%s' iter=%d Out=%.9g\n", \
            #_fnc,#_lim,iter,(double) Out); \
    } while (0)

#define LIM     1e-5
#define LIM2    1e-10

int
main(void)
{
    int iter;

#ifdef __STDC__VERSION__
    printf("std=%ld\n",__STDC_VERSION__);
#endif
    printf("LIM=%.6g\n",LIM);
    printf("LIM2=%.6g\n",LIM2);
    printf("ITER=%d\n",ITER);
    printf("fegetround=%8.8X\n",fegetround());
#ifdef FE
    fesetround(FE);
    printf("fegetround=%8.8X\n",fegetround());
#endif

    DOLOOP(Exp_Filt_orig,Out != 0.0f);
    DOLOOP(Exp_Filt_orig,Out > LIM);
    DOLOOP(Exp_Filt_fix1,Out > LIM);
    DOLOOP2(Exp_Filt_fix2,Out > LIM);
    DOLOOP2(Exp_Filt_fix2,Out > LIM2);
    DOLOOP2(Exp_Filt_fix2,Out != 0.0);

    return 0;
}

Here is a Makefile for it:

ALL += allxx all90 all99

ifdef O
  OLVL := -O$(O)
endif

ifdef ITER
  XFLAGS += -DITER=$(ITER)
endif

ifdef FE
  XFLAGS += -DFE=$(FE)
endif

.PHONY: all
all: $(ALL)

allxx: all.c
    cc -o allxx all.c -lm $(OLVL) $(XFLAGS)

all90: all.c
    cc -o all90 all.c -lm $(OLVL) $(XFLAGS) -std=c90

all99: all.c
    cc -o all99 all.c -lm $(OLVL) $(XFLAGS) -std=c99

run: all
    ./all90
    ./all99
    ./allxx

clean:
    rm -f $(ALL)

Here is the output of make clean run:

rm -f allxx all90 all99
cc -o allxx all.c -lm
cc -o all90 all.c -lm   -std=c90
cc -o all99 all.c -lm   -std=c99
./all90
LIM=1e-05
LIM2=1e-10
ITER=1000000
fegetround=00000000
fnc=Exp_Filt_orig lim='Out != 0.0f' iter=1 Out=0
fnc=Exp_Filt_orig lim='Out > LIM' iter=1 Out=0
fnc=Exp_Filt_fix1 lim='Out > LIM' iter=116 Out=9.16608587e-06
fnc=Exp_Filt_fix2 lim='Out > LIM' iter=116 Out=9.16608615e-06
fnc=Exp_Filt_fix2 lim='Out > LIM2' iter=231 Out=9.28532947e-11
fnc=Exp_Filt_fix2 lim='Out != 0.0' iter=1000000 Out=2.47032823e-323
./all99
LIM=1e-05
LIM2=1e-10
ITER=1000000
fegetround=00000000
fnc=Exp_Filt_orig lim='Out != 0.0f' iter=1000000 Out=7.00649232e-45
fnc=Exp_Filt_orig lim='Out > LIM' iter=116 Out=9.16610225e-06
fnc=Exp_Filt_fix1 lim='Out > LIM' iter=116 Out=9.16608587e-06
fnc=Exp_Filt_fix2 lim='Out > LIM' iter=116 Out=9.16608615e-06
fnc=Exp_Filt_fix2 lim='Out > LIM2' iter=231 Out=9.28532947e-11
fnc=Exp_Filt_fix2 lim='Out != 0.0' iter=1000000 Out=2.47032823e-323
./allxx
LIM=1e-05
LIM2=1e-10
ITER=1000000
fegetround=00000000
fnc=Exp_Filt_orig lim='Out != 0.0f' iter=1000000 Out=7.00649232e-45
fnc=Exp_Filt_orig lim='Out > LIM' iter=116 Out=9.16610225e-06
fnc=Exp_Filt_fix1 lim='Out > LIM' iter=116 Out=9.16608587e-06
fnc=Exp_Filt_fix2 lim='Out > LIM' iter=116 Out=9.16608615e-06
fnc=Exp_Filt_fix2 lim='Out > LIM2' iter=231 Out=9.28532947e-11
fnc=Exp_Filt_fix2 lim='Out != 0.0' iter=1000000 Out=2.47032823e-323

However, here is the output of make FE=FE_DOWNWARD clean run:

rm -f allxx all90 all99
cc -o allxx all.c -lm  -DFE=FE_DOWNWARD
cc -o all90 all.c -lm  -DFE=FE_DOWNWARD -std=c90
cc -o all99 all.c -lm  -DFE=FE_DOWNWARD -std=c99
./all90
LIM=1e-05
LIM2=1e-10
ITER=1000000
fegetround=00000000
fegetround=00000400
fnc=Exp_Filt_orig lim='Out != 0.0f' iter=1000000 Out=3.40282346e+38
fnc=Exp_Filt_orig lim='Out > LIM' iter=1000000 Out=3.40282346e+38
fnc=Exp_Filt_fix1 lim='Out > LIM' iter=116 Out=9.16604039e-06
fnc=Exp_Filt_fix2 lim='Out > LIM' iter=116 Out=9.16608615e-06
fnc=Exp_Filt_fix2 lim='Out > LIM2' iter=231 Out=9.28532947e-11
fnc=Exp_Filt_fix2 lim='Out != 0.0' iter=7427 Out=0
./all99
LIM=1e-05
LIM2=1e-10
ITER=1000000
fegetround=00000000
fegetround=00000400
fnc=Exp_Filt_orig lim='Out != 0.0f' iter=1015 Out=0
fnc=Exp_Filt_orig lim='Out > LIM' iter=116 Out=9.16599037e-06
fnc=Exp_Filt_fix1 lim='Out > LIM' iter=116 Out=9.16604039e-06
fnc=Exp_Filt_fix2 lim='Out > LIM' iter=116 Out=9.16608615e-06
fnc=Exp_Filt_fix2 lim='Out > LIM2' iter=231 Out=9.28532947e-11
fnc=Exp_Filt_fix2 lim='Out != 0.0' iter=7427 Out=0
./allxx
LIM=1e-05
LIM2=1e-10
ITER=1000000
fegetround=00000000
fegetround=00000400
fnc=Exp_Filt_orig lim='Out != 0.0f' iter=1015 Out=0
fnc=Exp_Filt_orig lim='Out > LIM' iter=116 Out=9.16599037e-06
fnc=Exp_Filt_fix1 lim='Out > LIM' iter=116 Out=9.16604039e-06
fnc=Exp_Filt_fix2 lim='Out > LIM' iter=116 Out=9.16608615e-06
fnc=Exp_Filt_fix2 lim='Out > LIM2' iter=231 Out=9.28532947e-11
fnc=Exp_Filt_fix2 lim='Out != 0.0' iter=7427 Out=0

Edit: Original post getting DV'ed:

When you call a given function, if one of the function's parameters is float, it is automatically passed by the calling function as a double and interpreted as a double by the called function.

That is, In, time_base, and time_constant are double even though you specified float. Only Out [because it's a pointer to float and not passed by value], is a float.

And, in expressions, a float is automatically promoted to a double during evaluation.

Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • 1
    (a) The default argument promotions (which convert `float` to `double`) apply only to arguments corresponding to `...` in a function declaration or arguments to a function without a prototype. When a parameter is declared `float` in a prototype, it is a `float`. (b) `float` values are not automatically promoted to `double` during evaluation (although a C implementation may evaluate using more precision and range than the nominal type). (c) Replacing the original `Exp_Filt` with the one in this answer does not resolve the issue asked about in the question. – Eric Postpischil Aug 09 '21 at 23:04
  • @EricPostpischil Fair enough. However, OP is [concerned about] losing precision [and convergence], so changing args to `double` helps that. And, the function will run [slightly] faster [because the args don't have to be promoted/converted to `double` in the expression. – Craig Estey Aug 09 '21 at 23:09
  • No, OP is not concerned with losing precision; that is not the issue. And, again, there is no promotion or conversion to `double`. See (b) above. – Eric Postpischil Aug 09 '21 at 23:12
  • About the edit: The function in the question **does not resolve the problem**. – Eric Postpischil Aug 09 '21 at 23:25
  • C90 onwards passes `float` as `float` if there is a prototype in scope and it doesn't have `...` ellipsis complicating the issue. – Jonathan Leffler Aug 09 '21 at 23:32
  • @EricPostpischil Not even the change to the loop? In what way is that dissimilar to your top comment? So, for _my_ edification, please post an answer rather than just comments here that say [effectively] "you're doing it wrong" without a suggestion as to how to improve the answer [sufficiently]. I'm always ready to fix/improve an answer. And, when you say "does not resolve the problem", you said "question", so are you talking about OP's function or mine? – Craig Estey Aug 09 '21 at 23:34
  • @JonathanLeffler Thanks for that--I was beginning to feel gaslighted because I remember the `float->double` args thing [and was not aware that this changed--An answer to Fortran's touted speed advantage I guess]. – Craig Estey Aug 09 '21 at 23:42