Convert char buffer to struct

Question

I have a char buffer buf containing buf[0] = 10, buf[1] = 3, buf[2] = 3, buf[3] = 0, buf[4] = 58,

and a structure:

typedef struct
{ 
    char type;
    int version;
    int length;
}Header;

I wanted to convert the buf into a Header. Now I am using the function

int getByte( unsigned char* buf)
{
    int number = buf[0]; 
    return number;
}

int getInt(unsigned char* buf)
{
    int number =  (buf[0]<<8)+buf[1];
    return number;
}

main()
{
    Header *head = new Header;
    int location = 0;

    head->type = getByte(&buf[location]);
    location++;     // location = 1

    head->version = getInt(&buf[location]);
    location += 2;  // location = 3

    head->ength = getInt(&buf[location]);
    location += 2;  // location = 5 
}

I am searching for a solution such as

 Header *head = new Header;

 memcpy(head, buf, sizeof(head));

In this, first value in the Header, head->type is proper and rest is garbage. Is it possible to convert unsigned char* buf to Header?

since int is 2 byte and char is 1 byte size of head should be 5. I will confirm it. — Akhil V Suku, Aug 18 '16 at 11:31
your char[5] size is 5 bytes your Header size is 9 bytes Header.length will never be initialised — Guillaume Kiz, Aug 18 '16 at 11:33
Don't use memcpy, it is a unsecure and unportable way. The only full portable and secure way is something like you do currently (see http://stackoverflow.com/questions/37430047/converting-bytes-array-to-integer) — Garf365, Aug 18 '16 at 11:43
If you don't need portability you can use a compiler specific feature, the "packing" of a struct. This is at least possible with GCC and Microsoft C++ compiler. But that requires that the target CPU supports unaligned access. Therefore you should provide informtation about your toolchain and the target platform and CPU. — harper, Aug 18 '16 at 11:59
@GuillaumeKiz no agree, if `sizeof(int)` is equal to 2 bytes (see http://stackoverflow.com/questions/11438794/is-the-size-of-c-int-2-bytes-or-4-bytes) — Garf365, Aug 18 '16 at 12:10

score 5 · Accepted Answer · edited May 23 '17 at 12:17

5

The only full portable and secure way is:

void convertToHeader(unsigned char const * const buffer, Header *header)
{
    header->type = buffer[0];
    header->version = (buffer[1] <<  8) | buffer[2];
    header->length = (buffer[3] <<  8) | buffer[4];
}

and

void convertFromHeader(Header const * const header, unsigned char * buffer)
{
    buffer[0] = header->type;
    buffer[1] = (static_cast<unsigned int>(header->version) >>  8) & 0xFF;
    buffer[2] = header->version & 0xFF;
    buffer[3] = (static_cast<unsigned int>(header->length) >>  8) & 0xFF;
    buffer[4] = header->length & 0xFF;
}

Example

see Converting bytes array to integer for explanations

EDIT

A quick summary of previous link: other possible solutions (memcpy or union for example) are no portable according endianess of different system (doing what you do is probably for a sort of communication between at least two heterogeneous systems) => some of systems byte[0] is LSB of int and byte[1] is MSB and on other is the inverse.

Also, due to alignement, struct Header can be bigger than 5 bytes (probably 6 bytes in your case, if alignement is 2 bytes!) (see here for example)

Finally, according alignment restrictions and aliasing rules on some platform, compiler can generate incorrect code.

edited May 23 '17 at 12:17

Community

1
1

answered Aug 18 '16 at 11:47

Garf365

3,619
5
29
41

using VS compiler, sizeof(Header) is 12 bytes, because of the alignment rule in Windows. – very hit Aug 18 '16 at 12:07
1

@veryhit Of course, see example of alignement (linux with gcc => 12 bytes also). I edit to clarify my answer, because I just take case of user, which seams to be on a 16bits platform – Garf365 Aug 18 '16 at 12:09
Why all the explicit (and unfortunately C-style) casts to `std::uint16_t`? First, that type is not even guaranteed to exist, and I don't know why you didn't just use `int`... and anyway, don't the shift/`&` operators perform any required casting to `int`? (which can hold the same values as `uint16_t` if it _did_ exist) – underscore_d Aug 18 '16 at 12:24
@underscore_d `uint16_t` is defined in `stdint.h`. Also, OP wants to manipulate data at bytes level, so I use well size defined type. And because `<<` has undefined behavior on signed integer and the result of `>>` on signed integer is implementation defined (see [here](http://stackoverflow.com/questions/11644362/are-the-results-of-bitwise-operations-on-signed-integers-defined)), I only realize shift operation on unsigned integer – Garf365 Aug 18 '16 at 12:29
I know very well where the exact width `[u]int??_t` types are defined, but they are only defined _if_ the platform provides a corresponding exact-width type; they are not required to exist. Those "undefined" and "implementation defined" cases are only if the value being shifted is negative, as the linked answer states. Which, yeah, they could be because the OP's buffer is one of `char`s, which has implementation-defined signedness... but casting them only moves the problem later, as you might wrap negative values, and the OP should just use `unsigned char`, the proper type for raw byte stuff. – underscore_d Aug 18 '16 at 12:39

Francesco Dondi · Answer 2 · 2016-08-18T13:25:59.563

What you want would need your version and length to have the same length as 2 elements of your buf array; that is you'd need to use the type uint16_t, defined in <cstdint>, rather than int which is likely longer. And also you'd need to make buf an array of uint8_t, as char is allowed to take more than 1 byte!

You probably also need to move type to the end; as otherwise the compiler will almost certainly insert a padding byte after it to be able to align version to a 2-byte boundary (once you have made it uint16_t and thus 2 bytes); and then your buf[1] would end up there rather than were you want it. This is probably what you observe right now, by the way: by having a char followed by an int, which is probably 4 bytes, you have 3 bytes of padding, and the elements 1 to 3 of your array are being inserted there (=lost forever).

Another solution would be to modify your buf array to be longer and have empty padding bytes as well, so that the data will be actually aligned with the struct fields.

Worth mentioning again is that, as pointed out in the comments, sizeof(head) returns the size of pointers on your system, not of the Header structure. You can directly write sizeof(Header); but at this level of micromanagement, you wont be losing any more flexibility if you just write "5", really.

Also, endianness can screw with you. Processors have no obbligation to store the bytes of a number in the order you expect rather than the opposite one; both make internal sense after all. This means that blindly copying bytes buf[0], buf[1] into a number can result in (buf[0]<<8)+buf[1], but also in (buf[1]<<8)+buf[0], or even in (buf[1]<<24)+(buf[0]<<16) if the data type is 4 bytes (as int usually is). And even if it works on your computer now, there is at least one out there where the same code will result in garbage. Unless, that is, those bytes actually come from reinterpreting a number in the first place. In which case the code is wrong (not portable) now, however.

...is it worth it?

All things considered, my advice is strongly to keep the way you handle them now. Maybe simplify it.

It really makes no sense to convert a byte to an int then to byte again, or to take the address of a byte to dereference it again, nor there is need of helper variables with no descriptive name and no purpose other than being returned, or of a variable whose value you know in advance at all time.

Just do

int getTwoBytes(unsigned char* buf)
{
    return (buf[0]<<8)+buf[1];
}

main()
{
    Header *head = new Header;

    head->type = buf[0];

    head->version = getTwoBytes(buf + 1);

    head->length = getTwoBytes(buf + 3);
}

"_What you want would need your version and length to have the same length as 2 `char`s; that is if you used the type `uint16_t`_" This is a false and dangerously non-portable assumption. `uint16_t` is only required to be exactly 16 bits wide; it has no relation whatsoever to the width of `char`. For example, on a platform where `char` is exactly 16 bits wide (yes, they exist), then `sizeof(char) == sizeof(std::uint16_t) == 1`, which clearly contradicts what you said. — underscore_d, Aug 18 '16 at 12:44
It seemed like the least of worries in all this that `char` could have a different size, but yes you're right of course. I'll update to cover this. — Francesco Dondi, Aug 18 '16 at 13:22

score -1 · Answer 3 · answered Aug 18 '16 at 13:33

-1

the better way is to create some sort of serialization/deserialization routines.

also, I'd use not just int or char types, but would use more specific int32_t etc. it's just platform-independent way (well, actually you can also pack your data structures with pragma pack).

    struct Header
    {
        char16_t type;
        int32_t version;
        int32_t length;
    };
    struct Tools
    {
        std::shared_ptr<Header> deserializeHeader(const std::vector<unsigned char> &loadedBuffer)
        {
            std::shared_ptr<Header> header(new Header);
            memcpy(&(*header), &loadedBuffer[0], sizeof(Header));
            return header;
        }
        std::vector<unsigned char> serializeHeader(const Header &header)
        {
            std::vector<unsigned char> buffer;
            buffer.resize(sizeof(Header));
            memcpy(&buffer[0], &header, sizeof(Header));
            return buffer;
        }
    }
    tools;
    Header header = {'B', 5834, 4665};
    auto v1 = tools.serializeHeader(header);
    auto v2 = tools.deserializeHeader(v1);

answered Aug 18 '16 at 13:33

fgrdn

91
1
5

don't use `memcpy` to serialize or deserialize a data, it's dangerous: alignement trouble (`sizeof(Header) != 5` see [here](http://coliru.stacked-crooked.com/a/7cce3c50ac756044)), no portable (endianess), and unsecure (see alignment restrictions and aliasing rules) – Garf365 Aug 18 '16 at 14:37
`pragma pack` can resolve alignment trouble but not other problems. Also, when changing alignment with pragma pack, don't forget to restore it just after. And finally, you add some portability issues because you can't be sure that target system support unaligned access – Garf365 Aug 18 '16 at 14:42
@Garf365 you clearly know that all the problems have solutions. i gave an abstract idea. you can pack the data, **of course you have to restore alignment**, i thought everubody knows about that so i didn't talk much about `pragma`. you also can recognize if it's big or little endian, and that's why there're special serialization/deserialization functions, where you can do whatever you want. memcpy also not such a terrible thing, if your hands grown not from your bottom :) – fgrdn Aug 18 '16 at 14:57

Convert char buffer to struct

3 Answers3