Portable conversion of data endianness using the Linux kernel's API

Question

How can I improve the following code, that is, make it more robust with respect to type safety and endianness using the functions and macros in the Linux kernel's API? For instance, in the following example src_data is an array of two 16-bit signed integers (typically stored in little endian order) and is to be sent out via UART in big endian byte order.

s16 src_data[2] = {...}; /* note: this is signed data! */
u8 tx_data[4];

u8* src_data_u8 = (u8*)src_data;

tx_data[0] = src_data_u8[1];
tx_data[1] = src_data_u8[0];
tx_data[2] = src_data_u8[3];
tx_data[3] = src_data_u8[2];

I think the functions cpu_to_be16 and cpu_to_be16p should play a role in doing this conversion. Although I'm not sure how I can use them in a way that is safe and robust to endianness.

What exactly is the issue you want to solve, that the comments in the header file you link do not help you with? Make your question more specific. — Michael Foukarakis, May 28 '18 at 09:35
If you write code like this, careful with the de-serialization of incoming protocols. You cant go from lets say `u8 rx_data[4]` to `s16 [2]` by using pointer conversions - that would be a strict aliasing violation. You get away with it here because you go from "any type" to character type, which is a special allowed case. — Lundin, May 28 '18 at 09:40
@MichaelFoukarakis The functions/macros in the header file don't seem safe since they only accept unsigned arguments... Furthermore, it is not clear to me how I can safely store the intermediate result before putting the converted data into my `u8 tx_data` array... — mallwright, May 28 '18 at 09:56
You need to consider those functions in pair, i.e. `cpu_to_be16()` and `be16_to_cpu()`. Also you might need to use `get_unaligned()` since not every host architecture may access to the data in unaligned manner. To be sure you are on the right track, consider the protocol design first. When you will have it done and tested, you may implement the real support on host system and slave device. — 0andriy, May 28 '18 at 21:59
Thanks for the input @0andriy, the protocol is fixed as big endian, and the target machine (an AVR) is little endian. As I understand, in practice, pretty much all Linux based machines are also little endian, but nonetheless, I would like to know how to achieve this conversion in a clean, platform independent way. — mallwright, May 29 '18 at 13:54
See my responses in the chat. Just in case, the both answer below are simple wrong. Don't follow them if you are going to submit your code to the Linux kernel. — 0andriy, May 29 '18 at 16:58
@0andriy It is fairly clear that you have never written a single line of hardware-related programming in your life. Sorry, but TCP/IP sockets is not hardware, it is layer 3 and 4 or so. I write drivers for embedded systems everyday and have done so for the past 15 years, but thanks for sharing your wisdom about something you have never worked with. — Lundin, May 30 '18 at 06:30
@Lundin there were two in one: first part of my reply was about hw, second — about protocol design. Thank you for your work we never see. Care to continue your brag sheet by referring to your GitHub page(s)? — 0andriy, May 30 '18 at 08:58

score 0 · Answer 1 · edited Jan 18 '22 at 12:17

0

As I understand, two 16-bit words, to be sent, one after another, after converting each into bigendian format. I think following should be ok.

s16 src_data[2] = {...}; /* note: this is signed data! */
s16 tx_data[2];
tx_data[0] = cpu_to_be16(src_data_u8[0]);
tx_data[1] = cpu_to_be16(src_data_u8[1]);

edited Jan 18 '22 at 12:17

Luca Ceresoli

1,591
8
19

answered Dec 20 '19 at 15:12

Parikshit Pareek

39
3

Michael Foukarakis · Answer 2 · 2018-05-28T23:51:10.370

-1

Your issue with safety seems to be that the htons(x) function/macro expects an unsigned integer, but you possess a signed one. Not an issue:

union {
    int16_t signed_repr;
    uint16_t unsigned_repr;
} data;

data.signed_repr = ...;

u16 unsigned_big_endian_data = htons(data.unsigned_repr);

memcpy(tx_data, &unsigned_big_endian_data,
       min(sizeof tx_data, sizeof unsigned_big_endian_data));

PS. Type-punning via unions is perfectly well-defined.

edited May 28 '18 at 23:51

answered May 28 '18 at 10:15

Michael Foukarakis

39,737
6
87
123

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/171979/discussion-on-answer-by-michael-foukarakis-portable-conversion-of-data-endiannes). – May 29 '18 at 09:36
Just would like to keep it here: https://davmac.wordpress.com/2010/02/26/c99-revisited/ – 0andriy May 30 '18 at 19:28

score -1 · Accepted Answer · answered Jun 06 '18 at 08:20

I believe the following is one of the best answers to my question. I have used the links provided by @0andriy to existing examples in the kernel source code.

Converting a signed 16-bit value for transmitting

s16 src = -5;
u8 dst[2];
__be16 tx_buf;
*(__be16*)dst = cpu_to_be16(src);

Converting multiple signed 16-bit values for transmitting

s16 src[2] = {-5,-2};
u8 dst[4];
s16* psrc = src;
u8* pdst = dst;
int len = sizeof(src);

for ( ; len > 1; len -= 2) {
    *(__be16 *)pdst = cpu_to_be16p(psrc++);
    pdst += 2;
}

A quick disclaimer, I still need to check if this code is correct / compiles.

Overall, I'm a bit unsatisfied with the solution for copying and converting the endianness of multiple values since it is prone to typos and could easily be implemented into a macro.

Lundin · Answer 4 · 2018-05-28T10:50:13.233

-2

If the Linux machine will always be little endian, and the protocol will always be big endian, then the code works fine and you don't need to change anything.

If you for some reason need to make the Linux code endian-independent, then you'd use:

tx_data[0] = ((unsigned int)src_data[0] >> 8) & 0xFF;
tx_data[1] = ((unsigned int)src_data[0] >> 0) & 0xFF;
tx_data[2] = ((unsigned int)src_data[1] >> 8) & 0xFF;
tx_data[3] = ((unsigned int)src_data[1] >> 0) & 0xFF;

Where the cast is there to ensure that the right shifts are not carried out on a signed type, which would invoke non-portable implementation defined behavior.

The advantage of bit shifts compared to any other version is that they work on an abstraction level above the hardware and endianess, letting the specific compiler generate the instructions for the underlying memory access. Code such as u16 >> 8 always means "give me the least significant byte" regardless of where that byte is stored in memory.

edited May 28 '18 at 10:50

answered May 28 '18 at 09:38

Lundin

195,001
40
254
396

This is almost a complete answer. If you could add a short explanation as to why this solution is endian-independent and confirm that (or speculate as to why) there isn't already a function in the Linux kernel's API to do or assist this conversion, I will accept the answer. – mallwright May 28 '18 at 10:00
1

More specifically, why would `tx_data[0]` contain the MSB of `src_data[0]`? On a little endian machine, I would have thought that `((unsigned int)src_data[0] >> 8)` would give you the LSB, since the LSB is at the lower address... – mallwright May 28 '18 at 10:07
1

@allsey87 I added a bit of explanation. Basically, C bit shifts don't know or care where bytes are stored in memory. There's probably some library functions, but this is such fundamental stuff that I see no reason for using a function. – Lundin May 28 '18 at 10:52
I see! So essentially, any data type in C can be visualised as a series of adjacent bytes with the MSB on the left and the LSB on the right, regardless of endianness. – mallwright May 28 '18 at 12:03
@allsey87 Only integer data types, since these are the only ones we can shift. – Lundin May 28 '18 at 13:13
No, please, don't do that in Linux kernel. – 0andriy May 28 '18 at 22:04

Portable conversion of data endianness using the Linux kernel's API

4 Answers4