Is IEEE-754 representation used in C?

Question

I have to encode the electron charge, which is -1.602*10^-19 C, using IEEE-754. I did it manually and verified my result using this site. So I know my representation is good. My problem is that, if I try to build a C program showing my number in scientific notation, I get the wrong number.

Here is my code:

#include <stdio.h>
int main(int argc, char const *argv[])
{
    float q = 0xa03d217b;
    printf("q = %e", q);
    return 0;
}

Here is the result:

$ ./test.exe
q = 2.688361e+09

My question: Is there another representation that my CPU might be using internally for floating point other than IEEE-754?

If you want precision why are you using `float` instead of `double`? What is the motivation behind using a hexadecimal encoded float? This is all a very, very bad idea. — tadman, Dec 28 '20 at 16:43
`double q = -1.602e-19`. Done. Don't play games here, you'll win dumb prizes. Just express it in its minimal form. — tadman, Dec 28 '20 at 16:43
What CPU are you using? Did you read the documentation for it to check how a `float` is represented internally? Did you account for [endian](https://en.wikipedia.org/wiki/Endianness) issues? — tadman, Dec 28 '20 at 16:45
I precisely have to encode the number on 32 bits. I also forgot to mention that this is an exercise. I had to encode the number using IEEE754 manually. This C program is just here to verify the result. — Corentin F, Dec 28 '20 at 16:47

Adrian Mole · Accepted Answer · 2020-12-28T17:12:29.733

11

The line float q = 0xa03d217b; converts the integer (hex) literal into a float value representing that number (or an approximation thereof); thus, the value assigned to your q will be the (decimal) value 2,688,360,827 (which is what 0xa03d217b equates to), as you have noted.

If you must initialize a float variable with its internal IEEE-754 (HEX) representation, then your best option is to use type punning via the members of a union (legal in C but not in C++):

#include <stdio.h>

typedef union {
    float f;
    unsigned int h;
} hexfloat;

int main()
{
    hexfloat hf;
    hf.h = 0xa03d217b;
    float q = hf.f;
    printf("%lg\n", q);
    return 0;
}

There are also some 'quick tricks' using pointer casting, like:

unsigned iee = 0xa03d217b;
float q = *(float*)(&iee);

But, be aware, there are numerous issues with such approaches, like potential endianness conflicts and the fact that you're breaking strict aliasing requirements.

edited Dec 28 '20 at 17:12

answered Dec 28 '20 at 16:49

Adrian Mole

49,934
160
51
83

This is probably the original intent, but yeah, there's a *lot* of caveats, so it's nice of you to lay those out. – tadman Dec 28 '20 at 16:53
3

You can avoid the aliasing problems by using a `union`, and I've never seen nor heard of a machine that uses different endianness for ints and floats, but it is technically possible. More likely is a machine that uses non-IEEE fp, though those are rare these days. – Chris Dodd Dec 28 '20 at 16:57
@ChrisDodd I was adding my edit while you were posting your comment. – Adrian Mole Dec 28 '20 at 16:59
1

Why bother giving somebody an improper method when an easy proper method exists? There is no reason to present the aliasing-via-pointer method at all. The union method is fine in C, but so is copying bytes, and that is defined (up to implementation issues) in C and C++. – Eric Postpischil Dec 28 '20 at 17:07
@EricPostpischil Good point. I have rearranged my answer to put the 'proper' method first; however, I have retained the 'improper' method by way of a warning *not* to use it. I have not included the method using `memcpy` (although it is good) because that is **your** answer, not mine. – Adrian Mole Dec 28 '20 at 17:13

score 2 · Answer 2 · answered Dec 28 '20 at 19:36

My problem is that if I try to build a c program showing my the number in scientific notation.

What if your target machine might or might not use IEEE754 encoding? Copying the bit pattern may fail.

If starting with a binary32 constant 0xa03d217b, code could examine it and then build up the best float available for that implementation.

#include <math.h>
#define BINARY32_MASK_SIGN 0x80000000
#define BINARY32_MASK_EXPO 0x7FE00000
#define BINARY32_MASK_SNCD 0x007FFFFF
#define BINARY32_IMPLIED_BIT 0x800000
#define BINARY32_SHIFT_EXPO 23

float binary32_to_float(uint32_t x) {
  // Break up into 3 parts
  bool sign = x & BINARY32_MASK_SIGN;
  int biased_expo = (x & BINARY32_MASK_EXPO) >> BINARY32_SHIFT_EXPO;
  int32_t significand = x & BINARY32_MASK_SNCD;

  float y;
  if (biased_expo == 0xFF) {
    y = significand ? NAN : INFINITY;  // For simplicity, NaN payload not copied
  } else {
    int expo;
    if (biased_expo > 0) {
      significand |= BINARY32_IMPLIED_BIT;
      expo = biased_expo - 127;
    } else {
      expo = 126;
    }
    y = ldexpf((float)significand, expo - BINARY32_SHIFT_EXPO);
  }
  if (sign) {
    y = -y;
  }
  return y;
}

Sample usage and output

#include <float.h>
#include <stdio.h>
int main() {
  float e = -1.602e-19;
  printf("%.*e\n", FLT_DECIMAL_DIG, e);
  uint32_t e_as_binary32 = 0xa03d217b;
  printf("%.*e\n", FLT_DECIMAL_DIG, binary32_to_float(e_as_binary32));
}

-1.602000046e-19
-1.602000046e-19

Bktero · Answer 3 · 2020-12-28T16:58:59.250

Hence, q doesn't not contains the value you expect. The hex value is converted to a float with the same value (with approximation), not with the same bit-representation.

When compiled with g++ and the option -Wall, there is a warning:

warning: implicit conversion from 'unsigned int' to 'float' changes value from 2688360827 to 2688360704 [-Wimplicit-const-int-float-conversion]

Can be tested on Compiler Explorer.

This warning is apparently not supported by gcc. Instead, you can use the option -Wfloat-conversion (with is not part of -Wall -Wextra):

warning: conversion from 'unsigned int' to 'float' changes value from '2688360827' to '2.6883607e+9f' [-Wfloat-conversion]

Again on Compiler Explorer.

score 0 · Answer 4 · answered Dec 29 '20 at 06:17

Note that C supports hexadecimal-floating point numbers as literals. See https://en.cppreference.com/w/cpp/language/floating_literal for details. This notation is useful to write the number in a portable way, without any concern for rounding issues as would be the case if you write it in regular decimal/scientific notation. Here's the number you're interested in:

#include <stdio.h>

int main(void) {
   float f = -0x1.7a42f6p-63;

   printf("%e\n", f);
   return 0;
};

When I run this program, I get:

$ make a
cc     a.c   -o a
$ ./a
-1.602000e-19

So long as your compiler supports this notation, you need not worry about how the underlying machine represents floats, so long as this particular number fits into its float representation.

Is IEEE-754 representation used in C?

4 Answers4