Floating point subtraction using sign bit flip and add

Question

Taking double subtraction code from this question Replacing __aeabi_dsub to save space (-flto issues) and adjusting it slightly (both for double and float values):

extern "C" double __aeabi_dsub(double a, double b) {
  // flip top bit of 64 bit number (the sign bit)
  reinterpret_cast<unsigned char *>(&b)[7] ^= 0x80; // assume little endian
  return a + b;
}


extern "C" float __aeabi_fsub(float a, float b) {
  // flip top bit of 32 bit number (the sign bit)
  reinterpret_cast<unsigned char *>(&b)[3] ^= 0x80; // assume little endian
  return a + b;
}

Do these implementations of a - b (for double/float) break any floating point code / IEEE specifications? Assuming an ARM Cortex-M0 architecture without floating point support, compilation by GCC.

Yes, this is evil, but I need the space (several KB of ROM saved by this) and cannot yet get rid of floating point (in favor of fixed point) calculations. — Daniel Jour, Jul 10 '23 at 22:31
If this actually saves any space, you need a better floating point library. Generally fp add (in software) is implemented by checking if the signs of the operands are the same -- if not, flip the sign of the second and call the fp subtract routine. — Chris Dodd, Jul 10 '23 at 23:12
@Chris, that's true, but may not be the full story. I've written code (though for "bignum" integers) that delegated functionality (so for non-negative `a, b`, `a + -b` became `a - b`, `-a - b` became `-(a+b)`, and so on) but both add and subtract functions were required because they handled specific cases (non-negative numbers). If that's the case here, this won't save anything since the add may delegate to the subtract (which would cause infinite recursion since both add and subtract now delegate to each other for that one case). — paxdiablo, Jul 10 '23 at 23:19
Per IEEE 754-2019 6.3, IEEE 754 does not specify the sign bit of a NaN result for operations other than copy, negate, abs, and copySign, but, when `b` is a NaN, a processor (or software arithmetic routines) might produce a different result for `a - b` than it does for `a + -b`. In other words, it does not violate IEEE 754 but may change the behavior of a program. `a - b` might produce the NaN `b` as the result whereas `a + -b` might produce `-b` as the result. — Eric Postpischil, Jul 11 '23 at 10:38
@ChrisDodd It's just the code from the standard libgcc which is pulled in by GCC autmatically on use of floating point types on a target without floating point hardware support. From the map file: `lib/gcc/arm-none-eabi/12.2.1/thumb/v6-m/nofp\libgcc.a(adddf3.o)` — Daniel Jour, Jul 11 '23 at 12:58
(So the code of the library should be basically this: https://github.com/gcc-mirror/gcc/tree/releases/gcc-12.2.0/libgcc/soft-fp) — Daniel Jour, Jul 11 '23 at 13:04

score 3 · Answer 1 · answered Jul 10 '23 at 22:46

3

Assuming IEEE 754 floating point, this shouldn't break any code which is easy to see by looking at the compiled code.

double dsub1(double a, double b) {
  reinterpret_cast<unsigned char *>(&b)[7] ^= 0x80; // assume little endian
  return a + b;
}

double dsub2(double a, double b) {
  return a - b;
}

is compiled to

dsub1(double, double):                             // @dsub1(double, double)
        fsub    d0, d0, d1
        ret
dsub2(double, double):                             // @dsub2(double, double)
        fsub    d0, d0, d1
        ret

(https://godbolt.org/z/rY4h5YTqb)

As you can see these are equivalent even on a low optimization level that doesn't allow incompatible FP transformations.

answered Jul 10 '23 at 22:46

vitaut

49,672
25
199
336

1

Ah that's clever! Thanks – Daniel Jour Jul 10 '23 at 22:48
1

Note, the selected compiler is targeting ARMv8. If you select a compiler that targets the actual cortex-m CPU, the results are different. https://godbolt.org/z/snoKdnedr – artless noise Jul 11 '23 at 19:33
The actual compiler/arch used in the example doesn't matter since its only purpose is demonstrating equivalence of the two methods. But it could matter in the actual application if the goal is reducing binary size. – vitaut Jul 11 '23 at 21:13
But, the behaviour of `fsub` maybe different on different CPUs. So support of the underlying float operations may take different care. The compiler may realizes things don't matter and collapse for a different CPU. It would seem strange that both LLVM and GCC do not make the functions identical when selecting a 32bit ARM cpu. I really don't understand how you can see equivalence on one CPU and then say that all CPUs in the world must be ok with this. The ISA is different. So there maybe oddities of ARM32 FPU (for historical reasons) that need to be accounted for. – artless noise Jul 11 '23 at 23:09
IEEE 754 operations are well-defined so it cannot be different unless there is a compiler bug. This is more of an abstract FP arithmetic question that doesn't depend on architecture. Some architectures may not implement IEEE 754 correctly but that's a different question altogether. – vitaut Jul 11 '23 at 23:39

paxdiablo · Answer 2 · 2023-07-11T00:15:00.477

That should be fine, at least on a conceptual level. However, you need to be a little bit careful here.

The fact that the subtract routine is about the same size as the add routine could mean (at least) two things:

the library writers did a poor job; or
they did a good job but you don't yet realise it :-)

The reason I state this is because I've written multi-precision integer libraries in the past where, other than some delegation, the add and subtract routines could assume certain properties to allow for simplified code. So, for example, the (pseudo-) code would be something like (in the comments, +x means x >= 0, -x means x < 0`):

def add(a, b):
    if a <= 0:
        if b <= 0:
            return -add(-a, -b)      # -a, -b.
        return sub(b, -a)            # -a, +b.

    if b <= 0:
        return sub(a, -b)            # +a, -b.

    # +a, +b, hence greatly simplified code.

def sub(a, b):
    if a <= 0:
        if b <= 0:
            return -sub(-a, -b)      # -a, -b.
        return -add(-a, b)           # -a, +b.

    if b <= 0:
        return add(a, -b)            # +a, -b.

    if a < b:
        return -sub(b, a)            # +a, +b, a < b.

    # +a, +b, a >= b, hence greatly simplified code.

The comments to the right show the guaranteed conditions which make the if condition true. Not all these are explicitly checked since, without them, an earlier if statement would have been true and thew function would already have returned.

The "simplified code" area could then concentrate on doing its job knowing that the numbers it had were "safe". For example:

It could do addition knowing that both numbers were non-negative, so it's a simple matter of starting at the right and adding digits with carry.
It could do subtraction without having to worry that the second number was bigger than the first, something that results in an "infinite borrow" problem in naive implementations.

So, if your add and subtract routines are basically duplicates (i.e., the library writers did a poor job) without referencing each other (even indirectly through other calls), you will probably be able to save some space by using your method.

However, if the library writers were a bit cleverer than that, it may well be that they've done a delegation job similar to what I describe above. That means it would be a rather bad idea to replace sub with something like what you are proposing:

def sub(a, b):
    return add(a, -b)

That's because add(5, -1) would almost delegate that call to sub(5, 1). Which would, of course, send it back to add(5, -1), and so on, right up until the point your stack overflows :-)

So, just be certain that these delegations do not happen before you assume that your method will work. Because this is the sort of thing a library writer should have put in their code (but see the "did a poor job" text above).

As noted in a comment above, the library in question is the standard libgcc shipped with GCC arm-none-eabi 12.2.1: `___aeabi_dadd` comes from `lib/gcc/arm-none-eabi/12.2.1/thumb/v6-m/nofp\libgcc.a(adddf3.o)` — Daniel Jour, Jul 11 '23 at 13:00

Floating point subtraction using sign bit flip and add

2 Answers2