6

Given a char buffer c containing an int (little endian). How to read it as int32_t?

I wrote this code but it doesn't feel idiomatic cpp.

int32_t v;
char* p = (char*)&v;
for (int i=0; i < 4; i++) {
    *(p+i) = *(c+i);
}
iksemyonov
  • 4,106
  • 1
  • 22
  • 42
Pierrot
  • 567
  • 7
  • 16
  • @ DavidHaim it make sense I tried this int32_t v = static_cast< int32_t >(*c); but it does not work – Pierrot Oct 07 '17 at 12:06
  • @DavidHaim Eg. that casting fails on big endians? Or that, depending on the usage, you get UB (strict aliasing)? – deviantfan Oct 07 '17 at 12:06
  • 4
    What is wrong with using bit-shifting and | operators? – Artemy Vysotsky Oct 07 '17 at 12:14
  • @ArtemyVysotsky Optimizer do not like this solution but they produce optimal code with the code above or with memcpy: https://godbolt.org/g/iMy1mn – Oliv Oct 08 '17 at 22:57
  • @Oliv std::int32_t v = *reinterpret_cast(p); => movsx eax, byte ptr [rdi] VS std::memcpy(&k,buffer,4); => mov eax, dword ptr [rdi] I don't quite get it there. – Pierrot Oct 08 '17 at 23:31
  • @Pierrot Your code is fine (as long as the int on the buffer is little endian too). What showes the link i provided is that what is proposed by Artemy Vysotsky produces suboptimal binary code (if I did not get wrong). Nevertheless, after having think about it twice, the solution proposed by Artemy Vysotsky is the most robust and the only one that never results neither implementation defined nor undefined behavior. – Oliv Oct 09 '17 at 06:04
  • @Pierrot There is a debate below, and every think is agree. Actualy each of the answer that were given to you could be fine, depending on the way your buffer is initialized. is it: 1-an array of char on which an int has been built or the results of the cast to a char* of a pointer to int; 2 - or a buffer of char on which the representation of an int has been stored; 3- data coming from some I/O? – Oliv Oct 09 '17 at 06:34
  • @Oliv I did not thought a simple question like that would raise such debate – Pierrot Oct 09 '17 at 09:59
  • @Pierrot I have asked some similar question, and the conclusion was, the c++ object model is underspecified... In your case, I think the troubles is caused by different interpretation of a sentence in your question "a buffer of char containing an int". Even if our mental model of a simple CPU does not make difference about between 2 "buffers of char", compiler does. The C++ virtual machine deal with object, lifetime, and type, if you violate that, bad surprises happen. – Oliv Oct 09 '17 at 10:12
  • The concequence is oftenly that code compiled by the "unworried" to cite Stroustrup, must be compiled with -fno-strict-aliasing, which realy make the code slower. That could happen, in the near future,if you use the answer proposing reinterpret_cast. But for short lived project that will be thrown to the trash they belong to, this kind of code is ok. – Oliv Oct 09 '17 at 10:16

4 Answers4

7

The only portable way to copy binary data from a char* buffer to any other data type is with memcpy (or an equivalent byte-copying merhod such as std::copy or one of your own that mimics this behaviour).

 memcpy(&my_number, my_buffer, sizeof(my_number));

Of course the buffer should contain correct bits for a given data type. If it originated from memory copying from the same data tyoe on the same machine, then endianness doesn't come into play. Otherwise you have to rearrange the bytes in the required order (either in place or in a temporary buffer), or swap the bytes in a platform-dependent manner in the in the integer itself (maybe with htonl and friends).

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
  • I wouldn't say that this is the only way, unions seem to fine here too. – David Haim Oct 07 '17 at 18:04
  • There is nothing preventing you from writing your own loop or using standard algorithms like `std::copy` too. As long as you are using `char*`, `unsigned char*`, `std::byte*` which legally alias any other types. – Galik Oct 07 '17 at 18:52
  • @Galik if you can do this correctly you don't need to ask this question. – n. m. could be an AI Oct 07 '17 at 18:57
6

There is no magic idiomatic way to handle endianness — endianness is an I/O issue. Either use the htonl() and ntohl() functions (available on every system) or just decode them yourself (as you are doing). I recommend writing functions to do it (make your life easier, and verifiable).

Dúthomhas
  • 8,200
  • 2
  • 17
  • 39
  • 1
    It should be noted that those functions convert to and from *big endian* (network byte order) whereas the question is asking about converting from *little endian*. – Galik Oct 07 '17 at 12:51
  • Thanks @Dúthomhas you answer helps as well – Pierrot Oct 08 '17 at 23:32
5

If you want to solve your issue in a portable and safe manner, use memcpy as n.m.'s answer explains. Otherwise, here's a more dangerous technique:

Note that this is UB. Only use the technique below if you are completely sure that the buffer contains the right amount of data and that the buffer and data are properly aligned.

If you are sure that the endianness of the system matches the one of the data stored in the char* buffer, you can use reinterpret_cast:

std::int32_t v = *reinterpret_cast<std::int32_t*>(p);

There's no Standard-compliant way of doing the above transformation. See this question for more details.

Vittorio Romeo
  • 90,666
  • 33
  • 258
  • 416
  • 7
    Reinterpret cast doesn't work with improperly aligned data. The behaviour is undefined. – n. m. could be an AI Oct 07 '17 at 12:30
  • @n.m. it's only undefined if the data or the buffer are not properly aligned, right? I.e. if the OP's example buffer and target `int32` are properly aligned, this will be fine. – Vittorio Romeo Oct 07 '17 at 18:04
  • 1
    My understanding is that strict aliasing is not just about alignment, it is about allowing the processor to take short cuts. So it would be undefined behavior regardless to alias memory with anything other than `char*`, `unsigned char*` and (now) `std::byte*`. https://stackoverflow.com/questions/9964418/strict-aliasing-and-alignment – Galik Oct 07 '17 at 18:48
  • 1
    This is undefined behavior. The answer of n.m. is the rigt one. – Oliv Oct 07 '17 at 20:50
  • 1
    While the whole "undefined in general, even when properly aligned" bla bla vs. `memcpy` is rather academical, I think this answer (while qualifying it at the end), DOES have a real practical problem: when casting from a character buffer, chances are really high that the data is IN FACT not correctly aligned and this does cause problems in actual practice. So yes, I really think this answer is kinda wrong, in that having the correct alignment out of a char buffer is ... unlikely. (Although it "just works" on e.g. a x86 platform last time I checked, its really not something we should reccommend) – Martin Ba Oct 08 '17 at 20:59
3

There is no standard function to discover the endianness of your system. However given such a function bool is_little_endian() that returns true only on little endian systems you might do something like this:

std::uint32_t read_from_little_endian(char* buf)
{
    std::uint32_t u;

    if(is_little_endian())
        std::copy(buf, buf + sizeof(u), (char*)&u);
    else
        std::reverse_copy(buf, buf + sizeof(u), (char*)&u);

    return u;
}

The important point is always to cast your std::uint32_t* to a char* because only char* can legally alias all other types.

Galik
  • 47,303
  • 4
  • 80
  • 117