0

I was given a floating point variable and wanted to know what its byte representation is. So I went to IDEOne and wrote a simple program to do so. However, to my surprise, it causes a runtime error:

#include <stdio.h>
#include <assert.h>

int main()
{
    // These are their sizes here. So just to prove it.
    assert(sizeof(char) == 1);
    assert(sizeof(short) == 2);
    assert(sizeof(float) == 4);

    // Little endian
    union {
        short s;
        char c[2];
    } endian;
    endian.s = 0x00FF; // would be stored as FF 00 on little
    assert((char)endian.c[0] == (char)0xFF);
    assert((char)endian.c[1] == (char)0x00);

    union {
        float f;
        char c[4];
    } var;
    var.f = 0.0003401360590942204;
    printf("%x %x %x %x", var.c[3], var.c[2], var.c[1], var.c[0]); // little endian
}

On IDEOne, it outputs:

39 ffffffb2 54 4a

along with a runtime error. Why is there a runtime error and why is the b2 actually ffffffb2? My guess with the b2 is sign extension.

Cole Tobin
  • 9,206
  • 15
  • 49
  • 74
  • The ffffffb2 is printed because vats.c[2] is a "char", which is a signed data type. printf will sign-extend this to a 32-bit integer (because that's what varargs do). You can either declare it as `unsigned char c[4]` or cast it in the printf. – user295691 Jul 25 '13 at 21:04
  • Have you noticed the number of significant digits is beyond the limits? Also use the 'f' suffix on the literal, to specify a float not a double – notNullGothik Jul 25 '13 at 21:05
  • Why `ffffffb2` instead of `b2` is sign extension as you guessed. But the run-time error? I hope you post what _definitely_ caused it. (sign v. unsigned, no `\n`, not `return`, etc.) – chux - Reinstate Monica Jul 25 '13 at 22:04
  • @notNullGothik The value `0.0003401360590942204` is converted to `float` when assigned to `var.f`. Unless you suspect this number is prone to double-rounding, there is little point in adding the `f` suffix to the literal. – Pascal Cuoq Jul 25 '13 at 23:30
  • 1
    As already commented, `var.c[i]` gets promoted to `int` when passed to `printf()`, because this is how variadic functions work. However the `%x` format expect a corresponding `unsigned int`. So in addition to using an array of `unsigned char` (each of which will still promote to `int` because that is how C works), you should call `printf("%x %x %x %x", (unsigned int) var.c[3], …` – Pascal Cuoq Jul 25 '13 at 23:33
  • possible duplicate of [Defined behavior, passing character to printf("%02X"](http://stackoverflow.com/questions/6069203/defined-behavior-passing-character-to-printf02x) – Cole Tobin Jul 26 '13 at 07:47

3 Answers3

6

char is a signed type. If it's 8 bits long and you put anything greater than 127 in it, it will overflow. Signed integer overflow is undefined behavior, so is printing a signed value using a conversion specifier that expects an unsigned one (%x expects unsigned int, but char is promoted [implicitly converted] to signed int when passed to the variadic printf() function).

Bottom line - change char c[4] to unsigned char c[4] and it will work fine.

  • 1
    C99 5.2.4.2.1 p2 says the `char` may be signed or unsigned. It is implementation defined. It is stated a bit more plainly in 6.3.1.1 p3: As discussed earlier, whether a ‘‘plain’’ char is treated as signed is implementation-defined. – jxh Jul 25 '13 at 22:11
  • 1
    “Signed integer overflow is undefined behavior” -> Where is there an undefined overflow in this question? Overflows in conversions to a signed type are implementation-defined. I do not see any other type of overflow. – Pascal Cuoq Jul 25 '13 at 23:37
  • @PascalCuoq [This](http://stackoverflow.com/questions/3948479/integer-overflow-and-undefined-behavior) says it's UB. (Perhaps it's IB in C++?) –  Jul 26 '13 at 04:13
  • @H2CO3 “Signed integer overflow is UB” is generally true but it lacks nuance. I am not disputing that for a sentence this length, it is as accurate as it can be, but the detail is that conversions to a signed type are in fact implementation-defined. The question you linked is for arithmetic overflows such as `0x10000 * 0x10000` on a 32-bit compiler. The only overflow I see in this question is for `(char)0xFF`, which most implementations (signed 8-bit `char`, wrap-around for overflow) define as `(char)-1`. In C99, this is in 6.3.1.3:3. – Pascal Cuoq Jul 26 '13 at 08:05
  • @PascalCuoq Thanks, I'll have a look at that clause. –  Jul 26 '13 at 08:25
5

Replace char by unsigned char in the struct and add a return 0; at the end fixes all the problems: http://ideone.com/ienG2b.

hivert
  • 10,579
  • 3
  • 31
  • 56
  • Why do I need `return 0`? I thought the standard said its optional and the compiler must implicitly add it? – Cole Tobin Jul 25 '13 at 21:04
  • Any explanation as to **why** do these? –  Jul 25 '13 at 21:04
  • @ColeJohnson AFAIK that's C99 (which we should use). `return 0;` is not implicit in C89. –  Jul 25 '13 at 21:04
  • The unsigned problem was perfectly explained by @H2CO3. For the return, it's the first time I'm using ideone.com and I don't know which C they are using and how to configure it. So it was just a guess. Sorry. – hivert Jul 25 '13 at 21:09
2

Your approach is all kinds of wrong. Here's how you print a general object's binary representation:

template <typename T>
void hexdump(T const & x)
{
    unsigned char const * p = reinterpret_cast<unsigned char const *>(&x);
    for (std::size_t i = 0; i != sizeof(T); ++i)
    {
        std::printf("%02X", p[i]);
    }
}

The upshot is that you can always interpret any object as a character array and thus reveal its representation.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084