1

Here is simple code for stm32f4

void main(void)
{
    float sum = 1.0;
    uint32_t cnt = 0;

    while(1)
    {
        for( cnt = 0; cnt < 1000; cnt++ )
            sum += 2.0e-08;

        printfUsart("%f\r\n", 
                            sum
                            );
    }
}

There is no changes of variable sum value. If i summarize in loop this value: sum += 2.0e-07; it increase. I use "gcc-arm-none-eabi-4_9-2014q4" compiler with this compile and linker flags:

PROCESSOR = -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16

So, how work with ultra small float values? I need it to implement Matlab generated code in stm32f4 firmware to realize some filtering functions.

user3583807
  • 766
  • 1
  • 8
  • 26
  • it's [`int main(void)`, not `void main()`](http://stackoverflow.com/questions/204476/what-should-main-return-in-c-and-c) – phuclv Mar 05 '15 at 17:14
  • I think void main() is okay, because we have an embedded micro-controller. There is no need of returning a value, because main has no calling thread or OS. – Lui Mar 06 '15 at 08:31
  • With floats, definitely use Kahan sum, but also consider multiplying everything by 10^6 or 10^8 during (and before) the loop as well...and then dividing afterwards. That's typically how similar issues are resolved in fixed point. – bunkerdive Mar 07 '15 at 12:20

2 Answers2

2

The IEEE 754 binary32 floating-point format has 24 bits of precision, which amounts to approximately 7 decimal digits (the correspondance is not exact because binary is not decimal).

This is not enough to distinguish 1 and 1.00000002. The binary32 value immediately above 1.0f is exactly 1.00000011920928955078125.

Your available options are

  1. to use the type double for the variable sum, assuming that double is mapped to IEEE 754 binary64 with its 53 bits of precision, or
  2. to improve accuracy by using a better summation algorithm. The most famous is Kahan's:

    void main(void)
    {
      float sum = 1.0;
      uint32_t cnt = 0;
      float c = 0;
    
      while(1)
      {
        for( cnt = 0; cnt < 1000; cnt++ )
        {
            float y = 2.0e-08f - c;
            float t = sum + y;
            c = (t - sum) - y;
            sum = t;
        }
    
        printfUsart("%f\r\n", sum);
      }
    }
    
Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
0

An other possibility is to optimize your filtering functions regarding to the used coefficients.

Here is the solution for manually optimization of your example:

void main(void)
{
    float sum = 1.0;
    uint32_t cnt = 0, temp = 0;

    while(1)
    {
        for( cnt = 0; cnt < 1000; cnt++ )
            temp += 2;
        sum = sum + temp*e-08;

        printfUsart("%f\r\n", 
                            sum
                            );
    }
}

An disadvantage is, that you have to do this optimization manually and that it is not generic, but it can save a lot of computing time because there are less floatingpoint-operations.

Lui
  • 159
  • 9