2

I have to encode the electron charge, which is -1.602*10-19 C, using IEEE-754. I did it manually and verified my result using this site. So I know my representation is good. My problem is that, if I try to build a C program showing my number in scientific notation, I get the wrong number.

Here is my code:

#include <stdio.h>
int main(int argc, char const *argv[])
{
    float q = 0xa03d217b;
    printf("q = %e", q);
    return 0;
}

Here is the result:

$ ./test.exe
q = 2.688361e+09

My question: Is there another representation that my CPU might be using internally for floating point other than IEEE-754?

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
Corentin F
  • 47
  • 1
  • 5
  • 1
    If you want precision why are you using `float` instead of `double`? What is the motivation behind using a hexadecimal encoded float? This is all a very, very bad idea. – tadman Dec 28 '20 at 16:43
  • 5
    `double q = -1.602e-19`. Done. Don't play games here, you'll win dumb prizes. Just express it in its minimal form. – tadman Dec 28 '20 at 16:43
  • What CPU are you using? Did you read the documentation for it to check how a `float` is represented internally? Did you account for [endian](https://en.wikipedia.org/wiki/Endianness) issues? – tadman Dec 28 '20 at 16:45
  • 1
    I precisely have to encode the number on 32 bits. I also forgot to mention that this is an exercise. I had to encode the number using IEEE754 manually. This C program is just here to verify the result. – Corentin F Dec 28 '20 at 16:47

4 Answers4

11

The line float q = 0xa03d217b; converts the integer (hex) literal into a float value representing that number (or an approximation thereof); thus, the value assigned to your q will be the (decimal) value 2,688,360,827 (which is what 0xa03d217b equates to), as you have noted.

If you must initialize a float variable with its internal IEEE-754 (HEX) representation, then your best option is to use type punning via the members of a union (legal in C but not in C++):

#include <stdio.h>

typedef union {
    float f;
    unsigned int h;
} hexfloat;

int main()
{
    hexfloat hf;
    hf.h = 0xa03d217b;
    float q = hf.f;
    printf("%lg\n", q);
    return 0;
}

There are also some 'quick tricks' using pointer casting, like:

unsigned iee = 0xa03d217b;
float q = *(float*)(&iee);

But, be aware, there are numerous issues with such approaches, like potential endianness conflicts and the fact that you're breaking strict aliasing requirements.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • This is probably the original intent, but yeah, there's a *lot* of caveats, so it's nice of you to lay those out. – tadman Dec 28 '20 at 16:53
  • 3
    You can avoid the aliasing problems by using a `union`, and I've never seen nor heard of a machine that uses different endianness for ints and floats, but it is technically possible. More likely is a machine that uses non-IEEE fp, though those are rare these days. – Chris Dodd Dec 28 '20 at 16:57
  • @ChrisDodd I was adding my edit while you were posting your comment. – Adrian Mole Dec 28 '20 at 16:59
  • 1
    Why bother giving somebody an improper method when an easy proper method exists? There is no reason to present the aliasing-via-pointer method at all. The union method is fine in C, but so is copying bytes, and that is defined (up to implementation issues) in C and C++. – Eric Postpischil Dec 28 '20 at 17:07
  • @EricPostpischil Good point. I have rearranged my answer to put the 'proper' method first; however, I have retained the 'improper' method by way of a warning *not* to use it. I have not included the method using `memcpy` (although it is good) because that is **your** answer, not mine. – Adrian Mole Dec 28 '20 at 17:13
2

My problem is that if I try to build a c program showing my the number in scientific notation.

What if your target machine might or might not use IEEE754 encoding? Copying the bit pattern may fail.

If starting with a binary32 constant 0xa03d217b, code could examine it and then build up the best float available for that implementation.

#include <math.h>
#define BINARY32_MASK_SIGN 0x80000000
#define BINARY32_MASK_EXPO 0x7FE00000
#define BINARY32_MASK_SNCD 0x007FFFFF
#define BINARY32_IMPLIED_BIT 0x800000
#define BINARY32_SHIFT_EXPO 23

float binary32_to_float(uint32_t x) {
  // Break up into 3 parts
  bool sign = x & BINARY32_MASK_SIGN;
  int biased_expo = (x & BINARY32_MASK_EXPO) >> BINARY32_SHIFT_EXPO;
  int32_t significand = x & BINARY32_MASK_SNCD;

  float y;
  if (biased_expo == 0xFF) {
    y = significand ? NAN : INFINITY;  // For simplicity, NaN payload not copied
  } else {
    int expo;
    if (biased_expo > 0) {
      significand |= BINARY32_IMPLIED_BIT;
      expo = biased_expo - 127;
    } else {
      expo = 126;
    }
    y = ldexpf((float)significand, expo - BINARY32_SHIFT_EXPO);
  }
  if (sign) {
    y = -y;
  }
  return y;
}

Sample usage and output

#include <float.h>
#include <stdio.h>
int main() {
  float e = -1.602e-19;
  printf("%.*e\n", FLT_DECIMAL_DIG, e);
  uint32_t e_as_binary32 = 0xa03d217b;
  printf("%.*e\n", FLT_DECIMAL_DIG, binary32_to_float(e_as_binary32));
}

-1.602000046e-19
-1.602000046e-19
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
1

Hence, q doesn't not contains the value you expect. The hex value is converted to a float with the same value (with approximation), not with the same bit-representation.

When compiled with g++ and the option -Wall, there is a warning:

warning: implicit conversion from 'unsigned int' to 'float' changes value from 2688360827 to 2688360704 [-Wimplicit-const-int-float-conversion]

Can be tested on Compiler Explorer.

This warning is apparently not supported by gcc. Instead, you can use the option -Wfloat-conversion (with is not part of -Wall -Wextra):

warning: conversion from 'unsigned int' to 'float' changes value from '2688360827' to '2.6883607e+9f' [-Wfloat-conversion]

Again on Compiler Explorer.

Bktero
  • 722
  • 5
  • 15
0

Note that C supports hexadecimal-floating point numbers as literals. See https://en.cppreference.com/w/cpp/language/floating_literal for details. This notation is useful to write the number in a portable way, without any concern for rounding issues as would be the case if you write it in regular decimal/scientific notation. Here's the number you're interested in:

#include <stdio.h>

int main(void) {
   float f = -0x1.7a42f6p-63;

   printf("%e\n", f);
   return 0;
};

When I run this program, I get:

$ make a
cc     a.c   -o a
$ ./a
-1.602000e-19

So long as your compiler supports this notation, you need not worry about how the underlying machine represents floats, so long as this particular number fits into its float representation.

alias
  • 28,120
  • 2
  • 23
  • 40