Convert Unsigned Char * to uint16_t *

Question

I have unsigned char * data like below:

unsigned char * data = ...;
data[0]=0x24;
data[1]=0x8A;
data[2]=0x07;
data[3]=0x75;

I want to convert it to uint16_t with below code:

uint16_t *res= (uint16_t*)(data);

The output will be :

res[0]= 0x8A24; res[1]=0x7507

But I want res become like this:

res[0]=0x248A; res[1]=0x0775;

How can I change my code?

What are you really trying to do? Convert ASCII or UTF8 strings to UTF16? You can't just do a cast, you need to know the encoding/codepage involved. And `uint16_t` is the wrong type for UTF16, the correct type is `char16_t` and `u16string` for strings — Panagiotis Kanavos, Aug 21 '20 at 09:52
Does this answer your question? [How do I convert between big-endian and little-endian values in C++?](https://stackoverflow.com/questions/105252/how-do-i-convert-between-big-endian-and-little-endian-values-in-c) — Susmit Agrawal, Aug 21 '20 at 09:55
@PanagiotisKanavos I want to convert Unsigned char * to uint16_t but i do not know the encoding.How can I know encoding? — Alexanov, Aug 21 '20 at 09:59
`I do not know the encodin` You do know, you specified the output. — KamilCuk, Aug 21 '20 at 10:00
@SusmitAgrawal I think little my problem is about little-endian and big-endian but I do not know how can I solve my problem with easy solution not adding independent function. — Alexanov, Aug 21 '20 at 10:02
@Alexanov if you talk about encoding, you *don't* want to convert characters to integers at all. You want to convert characters to characters. As for the encoding, it's either UTF16LE or UTF16BE, and you *assume* you want to convert from one to the other. Are you sure? What do the strings actually look like? It's quite possible the problem is only caused by the attempt to cast. Where does the data come from? A stream? Network input? Have you tried reading that data as Unicode strings instead of `char`? — Panagiotis Kanavos, Aug 21 '20 at 10:03
@Alexanov if you use a `std::basic_ifstream` you'll be able to read `char16_t` strings directly — Panagiotis Kanavos, Aug 21 '20 at 10:04
@PanagiotisKanavos I don't see anything in the OP talking about text and character encoding. It *smells* more like a basic endianness decoding of binary data. — xryl669, Aug 21 '20 at 10:10

xryl669 · Answer 1 · 2020-08-21T10:07:32.863

If you were to use operating system's header, you have the ntohs and htons function that can swap the bytes.

Else, it's trivial to add:

template <T> T swap_endianness(T in) {
  unsigned char array[sizeof(in)] = {};
  std::memcpy(array, &in, sizeof(in));
  // Byte swap now
  for (size_t i = 0; i < sizeof(in)/2; i++) 
    std::swap(array[i], array[sizeof(in) - i - 1]);
  std::memcpy(&in, array, sizeof(in));
  return in;
}

Usage is:

uint16_t *res= (uint16_t*)(data); // This is undefined behavior. Use memcpy instead.
res[0] = swap_endianness(res[0]); // and so on

This only works for Plain Old Data type, such as uint64, uint32, uint16, int64, int32, int16, float, double.

As @KamilCuk pointed out, the like (uint16_t *)(data) might lead to issues if data happens not to be aligned at 16-bit boundary and will likely cause a memory access error when you'll dereference it on platfrom like ARM. The correct solution is to use memcpy like this:

uint16_t res[2];
std::memcpy(res, data, sizeof(res));
res[0] = swap_endianness(res[0]);
res[1] = swap_endianness(res[1]);

score 1 · Accepted Answer · answered Aug 21 '20 at 10:00

But I want res become like this:

How can I change my code?

It's just:

res[0] = data[0] << 8 | data[1];
res[1] = data[2] << 8 | data[3];

I want to convert it to uint16_t with below code:
uint16_t *res= (uint16_t*)(data);
The output will be :

Such code will be invalid and dereferencing res results in undefined behavior. This is a strict alias violation and data may not be aligned to alignof(uint16_t). If the compiler happens to generate code that doesn't result in a hard fault because of unaligned access, the result depends on the endianess of the target machine. But anyway, this is just undefined behavior.

Convert Unsigned Char * to uint16_t *

2 Answers2