convert char* to float* using union or memcpy

Question

I am trying to convert a stream of characters into float *, but somehow unable to get the right result.

char  *char_data_  = static_cast<char *>("abcdefghijklmnopqrstuvwx");
float *float_data_ = reinterpret_cast<float *>(malloc(strlen(char_data_)/sizeof(float)));

printf("%ld\n", strlen(char_data_));

memcpy(float_data_, reinterpret_cast<float *>(char_data_), strlen(char_data_)/sizeof(float));

for ( auto n = 0 ; n < strlen(char_data_)/sizeof(float); n++) {
    printf("%f\n", *(float_data_ + n));
}

Following is my result, but its clear its incorrect. Can anyone have a look please?

16777999408082104352768.000000
0.000000
0.000000
0.000000
0.000000
0.000000

Is it possible to solve this problem using union?

I am expecting to have chunks of 4 bytes each of the character stream and save it into an array of floats.

@dbush - i am trying to extract 4 bytes each from the character stream and assign it to an array of floats. — infoclogged, Jul 18 '18 at 20:02
***i am trying to extract 4 bytes each from the character stream and assign it to an array of floats.*** Then your ***I was expecting*** part of your question is completely wrong. You should be expecting some floating point numbers. — drescherjm, Jul 18 '18 at 20:03
A `float` expects data ordered in IEEE-754 single precision floating point format. I've never seen a Latin alphabet that comes in IEEE-754 format. — David C. Rankin, Jul 18 '18 at 20:03
" i am trying to extract 4 bytes each from the character stream...." this is simply not doable. What float do u expect to get from "hijk" — pm100, Jul 18 '18 at 20:05
@pm100 At least in IEEE-754 each bit pattern means something. A lot of them are NAN though. — Max, Jul 18 '18 at 20:06
If the bytes in question are in fact a list of IEEE-754 floating point numbers with the bytes in the same order the host expects, it could be as simple as `float *float_data = reinterpret_cast(char_data);` — dbush, Jul 18 '18 at 20:09
@dbush - why is the first result a valid floating point and the rest are 0s? — infoclogged, Jul 18 '18 at 20:15
@infoclogged Because you aren't copying enough bytes. You need `strlen(char_data_)`, not `strlen(char_data_)/sizeof(float)`. Also, a real stream of floating point numbers may contain bytes with the value 0, in which case using string functions is useless. — dbush, Jul 18 '18 at 20:17
See https://en.cppreference.com/w/cpp/string/byte/memcpy the cast in the call is also useless. — Bob__, Jul 18 '18 at 20:20
Tip: 1) Using `unsigned char` rather than `char` reduces a number of minor annoyances and technical UB. 2) Use `"%e"` or `"%a"`. They are more informative than `printf("%f\n", ...` — chux - Reinstate Monica, Jul 18 '18 at 21:00
"but its clear its incorrect." --> It is not clear that output is incorrect. Post the expected output. — chux - Reinstate Monica, Jul 18 '18 at 21:03
***Post the expected output*** That was posted but retracted because the expected output was also wrong. — drescherjm, Jul 18 '18 at 21:15
@drescherjm The expected output is chunk of 4 bytes each from the character stream. The problem is solved now because, I was allocating wrong number of bytes in malloc and the second problem was my misunderstanding of floats coz float has a special representation and hence cannot be meaninguflly directly converted one to one from character stream. Both of them are addressed in the solution below. — infoclogged, Jul 18 '18 at 21:35
@DavidC.Rankin `float` is not necessarily IEEE754. It could be any floating point representation. — M.M, Jul 18 '18 at 23:28
@infoclogged yes every bit combination in a char string is a valid int. For example "aa" cast as a 16 bit int is 24929 (see if you can work out why). BTW there are all sorts of portability, laguagecompliance, endianness issues associated with this — pm100, Jul 20 '18 at 17:52

KamilCuk · Answer 1 · 2018-07-19T16:16:27.380

How to convert char* to float* using union or memcpy?

Using memcpy and reinterpretcast:

#include <cstdio>
#include <string>
#include <cstring>
#include <cassert>
#include <cstddef>

void using_pointer(const char *s, size_t slen) {
    const float *f = reinterpret_cast<const float*>(s);
    for (size_t i = 0; i < slen/sizeof(float); ++i) {
        printf("%zu = %f\n", i, f[i]);
    }
}

void using_memcpy(const char* s, size_t slen) {
    float* f = new float[slen/sizeof(float)];
    memcpy(f, s, slen/sizeof(float)*sizeof(float));
    for (size_t i = 0; i < slen/sizeof(float); ++i) {
        printf("%zu = %f\n", i, f[i]);
    }
    delete f;
}

int main() {
    static_assert(sizeof(float) == 4, "");
    std::string stdstr(
        "\x00\x00\x80\x3f" // float 1.0
        "\x00\x00\x81\x3f" // just randomly changed 0x80 to 0x81 
        "\x00\x00\x82\x3f"
    , 4 * 3);

    printf("using_pointer:\n");
    using_pointer(stdstr.c_str(), stdstr.size());
    printf("using_memcpy:\n");
    using_memcpy(stdstr.c_str(), stdstr.size());

    std::string s2("abcdefghijklmnopqrstuvwx");
    printf("Last:\n");
    using_memcpy(s2.c_str(), s2.size());

    return 0;
}

will output on machine that is used by http://www.onlinegdb.com :

using_pointer:                                                                                                         
0 = 1.000000                                                                                                           
1 = 1.007812                                                                                                           
2 = 1.015625                                                                                                           
using_memcpy:                                                                                                          
0 = 1.000000                                                                                                           
1 = 1.007812                                                                                                           
2 = 1.015625                                                                                                           
Last:                                                                                                                  
0 = 16777999408082104352768.000000                                                                                     
1 = 4371022013021616997400576.000000                                                                                   
2 = 1138400301458999111806091264.000000                                                                                
3 = 296401655701622853703074578432.000000                                                                              
4 = 77151445562813935304650187079680.000000                                                                            
5 = 20076561220099179535696200212676608.000000

@edit: As @drescherjm pointed out in comments, using union to convert between char and float representation is impossible in C++ unless based on undefined behavior. In C++ unions are not used to change representation of bytes. Unions are used to store at most one of multiple objects. You can store char array or float array in an union, not both at a time and you can't (at least shouldn't) convert between float and char representation using an union in C++.

What is happening in your code?

// static_cast<char*> from string is forbidden in iso C++. It's better to use  should use at least std::string.c_str()
// or use char char_data_[sizeof("abcdefghijklmnopqrstuvwx"]; memcpy(char_data_, "abcdefghijklmnopqrstuvwx", sizeof(char_data_));
char  *char_data_  = static_cast<char *>("abcdefghijklmnopqrstuvwx");
// you are allocating strlen("abcdefghijklmnopqrstuvwx")/4 = 24/4 = 6 bytes of memory. That's memory for 1 and a half float numbers.
float *float_data_ = reinterpret_cast<float *>(malloc(strlen(char_data_)/sizeof(float)));

// this will print '24\n'
printf("%ld\n", strlen(char_data_));

// you are copying 6 bytes of data from char_data_ to float_data_
// now float_data_ contains "abcdef" without '\0'
memcpy(float_data_, reinterpret_cast<float *>(char_data_), strlen(char_data_)/sizeof(float));

// this will print  16777999408082104352768.000000 on the first loop, which is "abcd" in hex in ascii
// then this invokes undefined behaviour cause of out of bound access
// you are trying to access elements number 2, 3, 4, 5 while float_data_ points to only 6 bytes, which is 1,5 float numbers, so even float_data_[1] is out of bound and undefined behaviour
for ( auto n = 0 ; n < strlen(char_data_)/sizeof(float); n++) {
    printf("%f\n", *(float_data_ + n));
}

thank you very much ! malloc was the problem that you correctly identified. Let me have a look at the union and will get back. — infoclogged, Jul 18 '18 at 21:01
Aren't both methods Undefined Behavior? I known the union is definitely UB in `c++`. https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior — drescherjm, Jul 18 '18 at 21:17
I see. Using union is undefined behavior. Why should be memcpy UB? I am sure that under my machine (!) the bytes stored in string represent floats, so i can convert the representation, as i know that the underlying bytes represent the float object. — KamilCuk, Jul 18 '18 at 23:05
"using pointer" version causes undefined behaviour (strict aliasing violation) — M.M, Jul 18 '18 at 23:27
But the underlying memory contains an object that is of compatible type with float, so it's not strict aliasing violation. I know that this memory contains float object on my machine, so it's not strict alias violation. Is it? — KamilCuk, Jul 18 '18 at 23:47
`unsigned int val = 0x0000803f; float float_val = *(reinterpret_cast( &val )); printf("answer - %f\n", float_val);` - I was expecting 1.0f as the result. But its not.. Any reason why? — infoclogged, Jul 19 '18 at 13:47
This code (yours and mine) is so wrong on so many levels. The representation of unsigned int is not equal to bytes. I mean `!memcmp(&val, (uint8_t[4]){0x00,0x00,0x80,0x3f}, 4)` may return false. sizeof(unsigned int) may be not equal to sizeof(float). Read about strict alias violation and endianess and float representation. Btw. Why do you want to convert char array into float? Isn't this XY problem? The only valid reason would be to explore float representation on a particular machine, such code has no production value. — KamilCuk, Jul 19 '18 at 16:15
@KamilCuk i want to convert char to float because this is a byte stream on the socket. The byte stream represents a depth image that is encoded in float and I want to extract these floats. — infoclogged, Jul 24 '18 at 18:12

convert char* to float* using union or memcpy

1 Answers1