0

I am trying to read a floating point value from a raw bytes array. I have a pointer to the raw bytes array and I would like to read the floating point value associated with the raw bytes. I am using a union data structure to read the floating point value, however I am unable to read the correct value.

// Floating point value: 0x3F800000 (floating point value 1.0)
char * c = "\u003F\u0080\u0000\u0000";
union char4_or_float {
    char element[4];
    float val;
} cf;
cf.element[0] = c[0];
cf.element[1] = c[1];
cf.element[2] = c[2];
cf.element[3] = c[3];
printf("%f", cf.val);

Expected result: 1.0, returned output: 0.0

I want to know if this code is correct. If not, could you please tell how to fix it? Also, if there are scenarios where it would not work, can you please mention them?

Thank you.

  • 2
    Type punning is not allowed in C++. Neither is non-`const` string literals – Ted Lyngmo Feb 21 '23 at 18:40
  • what compiler are you using ? – 463035818_is_not_an_ai Feb 21 '23 at 18:41
  • 1
    You also need to consider endian once you fix the type punjing – Mike Vine Feb 21 '23 at 18:43
  • @463035818_is_not_a_number Actually `c` needed to be so right from the beginning of C++, though compilers usually do (did?) allow that assignment for compatibility reasons to C... – Aconcagua Feb 21 '23 at 18:44
  • On my platform, the encoded number becomes the very small `0.0000000000000000000000000000000000000118246258769583552581731896732107144376468291533045687148696808` ([demo](https://godbolt.org/z/shPcnjfe7)).... Swapping the two top bytes with the two lower (`const char* c = "\u0000\u0000\u003F\u0080";`) gives `-47.75` ([demo](https://godbolt.org/z/9eqze5PTr)) – Ted Lyngmo Feb 21 '23 at 18:44
  • @Aconcagua afaik `char*` was ok, but nevertheless it was pratically `const` because it was undefined to modify the string literal. – 463035818_is_not_an_ai Feb 21 '23 at 18:45
  • 1
    The correct way to do this is to `memcpy` the data into a object of the appropriate type, `float` in this case. – NathanOliver Feb 21 '23 at 18:46
  • Modern machines typically are little endian machines, i.e. the least significant byte has smallest memory address ('comes first'). So to represent 0x1012 (`short`) in an array you need to store it as `{ 12, 10 }` – or in oyur case you could just revert the order in which you assign from one array to the other. Instead of the type punning you might `memcpy` instead. By the way: I personally never agreed on the type punning issue if only POD types are part of, my personal response to was implementing the conversion function *with* type punning in C then ;) – Aconcagua Feb 21 '23 at 18:52
  • @463035818_is_not_a_number Well, you seem [to be right](https://en.cppreference.com/w/cpp/language/string_literal) – *type* of string literals was `char const []` right from the start (which I had in mind), but until C++11 they still were assignable to pointers to non-const – big mistake finally fixed... – Aconcagua Feb 21 '23 at 19:00

4 Answers4

4

You have two problems:

  1. The use of unicode characters doesn't necessarily end up with the expected bytes in your string, try const char * c = "\x3F\x80\x00\x00"; instead
  2. You're presumably running on a little endian machine, your bytes are big endian so you need to swap when you do your copy:
cf.element[0] = c[3];
cf.element[1] = c[2];
cf.element[2] = c[1];
cf.element[3] = c[0];

All of the above relies on undefined behaviour though, a memcpy would be much simpler and legal:

#include <cstdio>
#include <cstring>

int main()
{
    const char * c = "\x00\x00\x80\x3f";
    float f;
    std::memcpy(&f, c, sizeof(f));
    printf("%f", f);
}
Alan Birtles
  • 32,622
  • 4
  • 31
  • 60
0

A few issues:

  1. You should define the char as const char* for literals

  2. A unicode literal should be preceded by "u" and assigned to a const char16_t* instead of const char*

  3. You could use a literal const char* with the "x" prefix

  4. Always use memcpy to avoid aliasing. Doing the "union way" is often undefined behavior as per standard, although it uses to work fine.

  5. PCs are little endian so the order of the bytes are reversed

Here is my take:

#include <cstdio>
#include <cstring>
#include <cstdint>

int main() 
{
    const char * c = "\x00\x00\x80\x3f";
    const char16_t * d = u"\u0000\u3f80";
    float val;
    memcpy( &val, c, sizeof(val));
    printf("%f\n", val);
    memcpy( &val, d, sizeof(val));
    printf("%f\n", val);

    uint32_t ival;
    memcpy(&ival,c,sizeof(ival));
    printf( "%08x\n", ival );
    memcpy(&ival,c,sizeof(ival));
    printf( "%08x\n", ival );
}

This prints

1.000000
1.000000
3f800000
3f800000

Godbolt link: https://godbolt.org/z/nsrGbaYn1

Something Something
  • 3,999
  • 1
  • 6
  • 21
0

You cannot use a union for type punning in a portable way, it is not allowed per standard C++. Moreover you need to take care of endianess.

In the following I start from a float. This can be viewed as an array of bytes. I copy it to a second array of char. I do this to get the proper input for the bytes -> float part, which is then made by memcopying the bytes to a float:

#include <iostream>
#include <cstring>

int main() {
     
    // prepare the right input with right endianess
    float x = 1.0;
    char* ptr = reinterpret_cast<char*>(&x);
    char* ptr_copy = new char[sizeof(float)];
    for (unsigned i=0;i<sizeof(float);++i) {
        std::cout << static_cast<unsigned>(ptr[i]) << " ";
        ptr_copy[i] = ptr[i];
    }

    // now ptr_copy is the array of bytes that can be 
    // transformed to a float via memcpy        
    float y;
    std::memcpy(&y,ptr_copy,sizeof(float));
    std::cout << "\n" << y;
}

Live Demo

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
0

One problem you have is that the string you are trying to create is not what you are actually creating. You are using \u, that parses as a unicode character (and in you case is invalid anyway). If you are trying to create the raw bytes for 0x3F800000 in memory you should escape them for example like this:

"\x3f\x80\x00\x00"

but that arises the second problem, what kind of endianess you are working on (probably little endian), as you are specifying the raw bytes as a succession in memory you must be aware of that.

"\x3f\x80\x00\x00" will produce 0x3f800000 in big endian

"\x00\x00\x80\x3f" will produce 0x3f800000 in little endian

so changing that line will make your code work (in case you are using a little endian platform)

// char * c = "\u003F\u0080\u0000\u0000";
char * c = "\x00\x00\x80\x3f"; // little endian for float 1

As you tag this question as being C++, I'll mention that you way of reading the raw bytes into a float should be something like this:

char *rawbytes="...";

float f=*reinterpret_cast<float*>(rawbytes);


in case the rawbytes are in a different endianess your system is, you will have to swap the bytes. Is not until C++23 that you have a built in for it, so probably you should go with something like this:

template<typename T, typename std::enable_if<std::is_integral_v<T> && sizeof(T)==4,int>::type=0>
constexpr inline void binarySwap(T &value) {
    std::uint32_t tmp = ((value << 8) & 0xFF00FF00) | ((value >> 8) & 0xFF00FF);
    value = (tmp << 16) | (tmp >> 16);
}

Of course the byte swap function will depend on the size of the float type you are working on. From your question I'm showing the 32 bit version here.

Pablo Yaggi
  • 1,061
  • 5
  • 14
  • `float f=*reinterpret_cast(rawbytes);`: That's an aliasing violation and therefore UB. Compilers will actually have unintended behavior with such uses. OP's union type punning approach, while also UB in the standard, at least is usually guaranteed to work by the individual compilers. – user17732522 Feb 21 '23 at 21:46