0

Is math.h in the standard library optimized for embedded systems especially ARM architecture (Cortex-M4)? If not what is the fastest alternative way to use math in a microcontroller?

Clifford
  • 88,407
  • 13
  • 85
  • 165
  • What ARM CPU exactly? Cortex-M3 does not have hardware floating point, so using `float` and `double` data types is slow because they must be calculated in software. Cortex-M4F seems to support `float` in hardware only (so `double` must be simulated)... – Martin Rosenau Feb 04 '22 at 14:00
  • The standard library is platform-specific. It would be sensible to optimize it for a given target, but "ARM" could mean anything. The low-end Cortex M don't have a FPU for example. – Lundin Feb 04 '22 at 14:00
  • 1
    Also "what is the best" isn't a question that can be answered. Fastest? Least memory? Best resolution? Easiest to use? Least floating point bugs? – Lundin Feb 04 '22 at 14:01
  • The file `math.h` contains only declarations of functions; the actual functions are implemented in object files in a "library" (in GCC this file would be named `libm.a`). There are different `libm.a` files for different CPU types. Of course, there are `libm.a` files for different ARM CPUs, too. – Martin Rosenau Feb 04 '22 at 14:02
  • I edited the question to be more specific. – Mohamed AbdelAzeem Feb 04 '22 at 14:36

1 Answers1

0

"Optimised" is a vague term - perhaps you meant "optimal". Any library provided with the toolchain is likely to be "optimised" in the sense that it has had either compiler optimisation applied, or uses intrinsics or hand-written assembler. Whether it is optimal is another matter.

Using floating-point code in any event has an impact. In C the math library is entirely defined for double precision operations, and Cortex-M4 devices either have no FPU or a single precision FPU (that can be disabled). Moreover it is a 32-bit device, and double is a 64-bit data type, so every operand load will take two bus transactions rather then one.

Single precision operations will be faster on Cortex-M4 whether or not it has an FPU. It is possible, but you would have to test it with your specific toolchain that the C++ library <cmath> might be faster for single precision math since it overloads all the C library functions with single precision float versions, so could conceivably be optimised for the Cortex-M4F single-precision FPU, and will in any case be able to pass operands in a single register.

Be aware however that software-floating-point (used on targets with no FPU or where the FPU is disabled) will be very much slower and generate significantly more code than hardware floating point.

There may also be good reasons to avoid hardware floating point in any case. In hard real-time systems using an RTOS, using the FP hardware requires that the FPU registers are preserved in context switches. If your application does very little math, but a lot of context switching, the context-switch overhead added by FP operations may exceed the performance benefit of floating point hardware.

For devices without an FPU math operations will be much faster using fixed point math, but in C that can be cumbersome, and for for a full math.h-like interface with trig, log etc. can become complex. I have been using C++ fixed point math library for years that uses the CORDIC algorithm for all the higher-level math functions, and a 64-bit 36Q28 fixed point representation allowing a range ≅±235 and ≅8 decimal-places of precision. On ARM it is perhaps about 5 times faster than a typical software floating point library, and comparable to the performance of a hardware FPU for some operations on some targets (with simple FPUs). By use a fixed class and extensive function overloading, for the most part code using double or float need only replace those with fixed and the code will work (withing the range and precision limits of the 36Q28 representation).

The library is based on Anthony Williams' article originally published in Dr.Dobbs Journal in 2008. I have added functions for conversion to/from decimal string representations for input and presentation, improved the sqrt() precision for very small values and fixed a bug in the log lookup table (mentioned in a comment at the link above which in turn refers to this answer which also has an example of use of the fixed data type to calculate the distance and bearing between two geographical lon/lat positions, so you can see what code using it looks like).

Whilst Anthony's library requires C++, it does not mean that your code need be object-oriented - if using C++ makes you nervous or is unfamiliar, you can largely write procedural C-like code but use C++ compilation in order to take advantage of the library. The advantages are largely syntactical - code using fixed looks much like code using double or float for example, but the code and algorithms could equally be implemented in C, but instead of say:

fixed x = (fixed)1 / 2 ; // 0.5 in fixed point

you might have:

tFixed x = fixedDiv( toFixed(1), toFixed(2) ) ;

where tFixed is a typedef alias for int64_t for example. As you can see, for this, C++ results in syntactically clearer code. Some might argue however that it "hides" what is reall going on - which it does; deliberately so.

If your range and precision needs are lower, you could equally implement the fixed-point using a 32-bit integer type, the performance advantage would be significant, but the chance of overflow or precision error also much greater.

For simple arithmetic (+, -. *, / ) operations the implementation is trivial and unless you are doing many such operations, hardly needs a library, you can simply scale your operands and results as necessary to the particular calculation, meaning that each individual operation can be optimised to the needs of the specific calculation rather then being generic. If you are doing a lot of math however and trig etc. a more general purpose library would be an advantage. But always sub-optimal (coming back to your original question) because it necessarily encompass a wide range of use cases.

Clifford
  • 88,407
  • 13
  • 85
  • 165