6

I recently wrote some code that uses memcpy to unpack float/double to unsigned integers of the appropriate width, and then uses some bit-shifting to separate the sign bit from the combined significand/exponent.

Note: For my use case, I don't need to separate the latter two parts from eachother, but I do need them in the correct order i.e: {sign, (exponent, significand)}, with the latter tuple packed as an unsigned int of sufficient width.

My code is working fine, thoroughly tested and no trouble; however I was a bit alarmed to discover that IEEE-754 doesn't specify the endianness of its binary interchange formats —which to my understanding, means there is a rare possibility that my bit-shifting may be incorrect in the rare occasions where float endiannessinteger endianness.

Based on the answered question here, my assumption is that given that bit-shifting is independent of actual endianness in storage, I only need to worry about whether the endianness of my floats matches that of my ints.
I devised the following code loosely following that in the linked answer, but avoiding the use of type-punning through pointer casts, which seems like unspecified/undefined behaviour territory to me:

#include <cstdint>
#include <cstring>

// SAME means "same as integer endianness"
enum class FloatEndian { UNKNOWN, SAME, OPPOSITE, };

FloatEndian endianness() {
    float check = -0.0f; // in IEEE-754, this should be all-zero significand and exponent with sign=1
    std::uint32_t view;
    std::memcpy(&view, &check, sizeof(check));
    switch (view) {
    case 0x80000000: // sign bit is in most significant byte
        return FloatEndian::SAME;
    case 0x00000080: // sign bit is in least significant byte
        return FloatEndian::OPPOSITE;
    default: // can't detect endianness of float
        return FloatEndian::UNKNOWN;
    }
}

If I ensure that my floats are indeed IEEE-754 with std::numeric_limits<T>::is_iec559, is my approach a robust and portable way of making sure I get the floats "the right way round" when I chop them up?

saxbophone
  • 779
  • 1
  • 6
  • 22
  • 1
    I suppose we could even imagine a machine where floating-point numbers are *mixed* endian, where the sign bit is in (say) the least significant byte, but the other bytes are ordered in some other fashion. – Nate Eldredge Jul 31 '22 at 21:42
  • @NateEldredge yes indeed, although (correct me if I'm wrong) if IEEE-754 format is guaranteed, then this possibility is excluded, as the ordering of the fields (sign, significand, exponent) _is_ specified **in relation to eachother** for that format, it's just the literal order of the bytes _all together_ that is unspecified. – saxbophone Jul 31 '22 at 21:46
  • At least `2301` ordering should be possible if not for `float` then for `int32`, e.g., if the machine only has native 16bit support. That said, I would not rely on any kind of runtime detection. Just provide a list of architectures where your code was tested and is known to work. If you have a CPU with weird byte ordering, there might be other quirks as well ... – chtz Jul 31 '22 at 22:10
  • 1
    @chtz why would you not rely on any runtime detection? (as an aside, I would prefer to do it at compile-time —this can be done if C++20's `bit_cast()` is used instead of memcpy, I've tested it on Godbolt.org but I don't have library support for it on the currently installed toolchain I have on my dev machine) – saxbophone Jul 31 '22 at 22:19
  • @NateEldredge The VAX-11 was one such real machine, possibly not in floating-point but certainly in 32-bit integers. – user207421 Aug 01 '22 at 00:45
  • @saxbophone You are compiling for a specific CPU. It is therefore pointless *not* to handle this at compile time. – user207421 Aug 01 '22 at 00:46
  • 2
    @saxbophone see [mixed endian](https://en.wikipedia.org/wiki/Endianness#Middle-endian). The digits in `0123`, `3210`, `2301` means the byte number – phuclv Aug 01 '22 at 01:31
  • @user207421 I would agree, except for the fact of whether the functions needed to work out the orientation of the floats with respect to int, can run at compile-time or not. For instance, if `bit_cast()` is not available due to lacking library/C++20 support, then one might have to use `memcpy()`, which I don't think is `constexpr`..? – saxbophone Aug 01 '22 at 01:52
  • @saxbophone "My code is ..., thoroughly tested ..." --> How ,many different platforms/compilers was it tested with? – chux - Reinstate Monica Aug 01 '22 at 11:41
  • Why not use macro `__BYTE_ORDER` just like GNU header `ieee754.h` does? – Maxim Egorushkin Aug 01 '22 at 14:18
  • 1
    @MaximEgorushkin As the endian of `float` and `uint32_t` may differ. `__BYTE_ORDER__` is for integers. `__FLOAT_WORD_ORDER__` is for FP. IAC, is is a GNU extension and not certainly available with other compilers. – chux - Reinstate Monica Aug 01 '22 at 16:45
  • @chux-ReinstateMonica On what platform are byte orders of `float` and `uint32_t` different? Shouldn't you file a bug report against `ieee754.h` with a reproduction where `ieee754.h` is wrong? – Maxim Egorushkin Aug 02 '22 at 11:38
  • 1
    @MaximEgorushkin "what platform are byte orders of float and uint32_t different" is a good question to post. I have not, in last 10 yrs seen such and wonder too how common it remains - it certainly is rare. [number of hardware architectures where floating-point numbers are represented in big-endian form while integers are represented in little-endian form](https://en.wikipedia.org/wiki/Endianness#Floating_point). IAC, C does not require the same endian. What specifically is the bug you are suggesting in ieee754.h? That already supports different endian for integer/float. – chux - Reinstate Monica Aug 02 '22 at 14:03
  • 1
    @chux-ReinstateMonica You are right, `ieee754.h` uses `__FLOAT_WORD_ORDER` for machines with different endianness for floating point values, my mistake. – Maxim Egorushkin Aug 06 '22 at 21:23

1 Answers1

2

Is checking the location of the sign bit enough to determine endianness of IEEE-754 float with respect to integer endianness?

  • As I read it, given the C++ spec and the C spec that it tends to also rely on, checking the sign bit is technically insufficient to determine endian relationship between float/uint32_t. It is likely practically sufficient as endians other than big/little are rare as well as differences between float/uint32_t endian.

  • I would suggest a different constant than -0.0f, maybe -0x1.ca8642p-113f which has the pattern 0x87654321 and would be a more thorough endian test. Quite unclear why OP wants to use a one's-bit-sparse -0.0f to discern 3 possible results.

  • As mentioned by others, in C++, the test should be a compile time one, so thoroughness is not a run-time cost over the simplicity of only testing a sign bit.

  • Relying on is_iec559 is true may unnecessarily limits portability as for that to be true, many non-finite compliance rules are needed. ref. Does your code really need quiet and signaling NANs?

  • See also If is_iec559 is true, does that mean that I can extract exponent and mantissa in a well defined way?.

  • I hope OP also tests that the sizeof(float) == sizeof(uint32_t) else memcpy(&view, &check, sizeof(check)) is bad code.

is my approach a robust and portable way of making sure I get the floats "the right way round" when I chop them up?

  • Code is not as robust and portable as it could be.

  • "when I chop them up" --> that code is not shown, so unanswerable. I am suspect of the endianness() goal that is used to support "uses memcpy to unpack float/double to unsigned integers of the appropriate width, and then uses some bit-shifting to separate the sign bit from the combined significand/exponent." It is that code that deserves review.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • For the purposes of what I'm doing (a little bit of bit-hacking on IEEE-754's representation), I'm completely ok with requiring `is_iec559()` if it means I can guarantee (up to the point of a lying implementation of `std::numeric_limits` ;) ) that my floats are indeed IEEE-754. I only care that my code is portable among all systems that use IEEE-754 format. – saxbophone Aug 01 '22 at 17:34
  • @saxbophone 1) Since "only care that my code is portable" is the issue, not "Is checking the location of the sign bit enough", that code should be reviewed. 2) Requiring `is_iec559()` limits portability. – chux - Reinstate Monica Aug 01 '22 at 17:39
  • The point about the value being used for the endianness test is interesting. The reason why I decided to use `0.0f` is because the way I see it, if the float is in big-endian or little-endian format, I can split it apart into sign and combined exponent and significand, I only need to possibly flip the bytes if it's the other way round to that I expect it and then split off the sign. I assumed if it's neither of those ways round, there's no way for me to split off all the parts in a well-defined way, so I only have "SAME", "OPPOSITE" and "UNKNOWN". Is it possible to split in the latter case? – saxbophone Aug 01 '22 at 17:40
  • Do you think it would be worthwhile submitting the float-chopping, bit-shifting code to https://codereview.stackexchange.com/ ? – saxbophone Aug 01 '22 at 17:45
  • @saxbophone If the endian is not big nor little, `endianness()` may return `SAME` or `OPPOSITE`. IOWs, `endianness()` is a good test for big and little, but not a good test for other endians like PDP. – chux - Reinstate Monica Aug 01 '22 at 17:47
  • 1
    @saxbophone Yes, codereview is a goods place for code review, given that you reasonable believe it is error free. – chux - Reinstate Monica Aug 01 '22 at 17:48
  • Oh, I think I see it now —because say if it was some kind of middle-endian, it's possible that the sign bit might still get put into the first or last byte? – saxbophone Aug 01 '22 at 17:49
  • 1
    @saxbophone Yes, that what a more robust pattern than 0x80000000 provides. IMO, consider the 3 known integer endians (big, little, PDP) and 2 known FP endians (big, little) (e.g. GCC `__BYTE_ORDER__`, `__FLOAT_WORD_ORDER__` and what those combinations could result in your test. Yet I still think this dances around the issue as it is the unposted float-chopping code itself that deserves review. Possible a good solution for that may not even need `endianness()`. – chux - Reinstate Monica Aug 01 '22 at 18:02
  • 1
    That is a really good point about the potential lack of need for the approach I'm using in the first place. I think my question might have hit the XY problem. Thank you for your forensic answer and follow-up, it is most helpful. – saxbophone Aug 01 '22 at 18:07