C/C++ code to convert big endian to little endian

Question

I've seen several different examples of code that converts big endian to little endian and vice versa, but I've come across a piece of code someone wrote that seems to work, but I'm stumped as to why it does.

Basically, there's a char buffer that, at a certain position, contains a 4-byte int stored as big-endian. The code would extract the integer and store it as native little endian. Here's a brief example:

char test[8] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07};
char *ptr = test;
int32_t value = 0;
value =  ((*ptr) & 0xFF)       << 24;
value |= ((*(ptr + 1)) & 0xFF) << 16;
value |= ((*(ptr + 2)) & 0xFF) << 8;
value |= (*(ptr + 3)) & 0xFF;
printf("value: %d\n", value);

value: 66051

The above code takes the first four bytes, stores it as little endian, and prints the result. Can anyone explain step by step how this works? I'm confused why ((*ptr) & 0xFF) << X wouldn't just evaluate to 0 for any X >= 8.

Because `char` values are promoted to `int` before arithmetic is done. Note: should be using `unsigned char *` and `uint32_t`. — Weather Vane, Jul 25 '17 at 15:27
Your code is independent from endianess, it will print `66051` on little and big endian machines. `value` is stored in the endianess of the machine, not always in little endian. — mch, Jul 25 '17 at 15:33
The `& 0xFF` is only necessary for signed values, to strip off the extra bits when a negative `char` value is sign-extended to `int`. One reason to use `unsigned`, as well as dubious shifting into the sign bit. — Weather Vane, Jul 25 '17 at 15:38
@WeatherVane: Using signed char and signed integers is indeed not beautiful, but does not change anything of the functionality of this swapping procedure. — Tom Kuschel, Jul 25 '17 at 19:15
@TomKuschel as I mentioned, shifting bits into the sign bit of a signed `int` is bad. Specifically, where the bit to be shifted into the sign bit, is different from the sign bit. — Weather Vane, Jul 25 '17 at 19:18

score 2 · Answer 1 · answered Jul 25 '17 at 15:32

This code is constructing the value, one byte at a time.

First it captures the lowest byte

 (*ptr) & 0xFF

And then shifts it to the highest byte

 ((*ptr) & 0xFF) << 24

And then assigns it to the previously 0 initialized value.

 value =((*ptr) & 0xFF) << 24

Now the "magic" comes into play. Since the ptr value was declared as a char* adding one to it advances the pointer by one character.

 (ptr + 1) /* the next character address */
 *(ptr + 1) /* the next character */

After you see that they are using pointer math to update the relative starting address, the rest of the operations are the same as the ones already described, except that to preserve the partially shifted values, they or the values into the existing value variable

 value |= ((*(ptr + 1)) & 0xFF) << 16

Note that pointer math is why you can do things like

 char* ptr = ... some value ...

 while (*ptr != 0) {
     ... do something ...
     ptr++;
 }

but it comes at a price of possibly really messing up your pointer addresses, greatly increasing your risk of a SEGFAULT violation. Some languages saw this as such a problem, that they removed the ability to do pointer math. An almost-pointer that you cannot do pointer math on is typically called a reference.

Sir Jo Black · Answer 2 · 2017-07-25T21:23:50.623

A code you might use is based on the idea that numbers on the network shall be sent in BIG ENDIAN mode.

The functions htonl() and htons() convert 32 bit integer and 16 bit integer in BIG ENDIAN where your system uses LITTLE ENDIAN and they leave the numbers in BIG ENDIAN otherwise.

Here the code:

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <arpa/inet.h>

int main(void)
{
    uint32_t x,y;
    uint16_t s,z;

    x=0xFF567890;

    y=htonl(x);

    printf("LE=%08X BE=%08X\n",x,y);

    s=0x7891;

    z=htons(s);

    printf("LE=%04X BE=%04X\n",s,z);

    return 0;

}

This code is written to convert from LE to BE on a LE machine.

You might use the opposite functions ntohl() and ntohs() to convert from BE to LE, these functions convert the integers from BE to LE on the LE machines and don't convert on BE machines.

score 1 · Answer 3 · answered Jul 25 '17 at 17:19

1

If you want to convert little endian represantion to big endian you can use htonl, htons, ntohl, ntohs. these functions convert values between host and network byte order. Big endian also used in arm based platform. see here: https://linux.die.net/man/3/endian

answered Jul 25 '17 at 17:19

Milad Kahsari Alhadi

583
5
13

As I know, default endianness of ARM is little endian. Most are used in little endian, but you may switch to big endian (never saw that in any project). The ARM and also Intel since x486 provide native support swapping instructions. See here: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0210c/Cihceeec.html – Tom Kuschel Jul 25 '17 at 18:45
This answer is only correct *for code running on a little-endian host*. For code running on a big-endian host (HP-UX springs to mind), then `htonl()` and friends are all no-ops. – Toby Speight Jul 26 '17 at 08:37
Why is there no integer equalent? – neoexpert Mar 09 '19 at 14:11

score 0 · Answer 4 · answered Jul 25 '17 at 18:57

I'm confused why ((*ptr) & 0xFF) << X wouldn't just evaluate to 0 for any X >= 8.

I think you misinterpret the shift functionality.

value = ((*ptr) & 0xFF) << 24;

means a masking of the value at ptr with 0xff (the byte) and afterwards a shift by 24 BITS (not bytes). That is a shift by 24/8 bytes (3 bytes) to the highest byte.

score 0 · Answer 5 · answered Jul 25 '17 at 21:44

0

One of the keypoints to understanding the evaluation of ((*ptr) & 0xFF) << X

Is Integer Promotion. The Value (*ptr) & 0xff is promoted to an Integer before being shifted.

answered Jul 25 '17 at 21:44

Steen

201
2
6

Sir Jo Black · Answer 6 · 2017-07-26T12:11:49.167

I've written the code below. This code contains two functions swapmem() and swap64().

swapmem() swaps the bytes of a memory area of an arbitrary dimension.
swap64() swaps the bytes of a 64 bits integer.

At the end of this reply I indicate you an idea to solve your problem with the buffer of byte.

Here the code:

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <malloc.h>

void * swapmem(void *x, size_t len, int retnew);
uint64_t swap64(uint64_t k);

/**
    brief swapmem

         This function swaps the byte into a memory buffer.

    param x
         pointer to the buffer to be swapped

    param len
         lenght to the buffer to be swapped

    param retnew
         If this parameter is 1 the buffer is swapped in a new
         buffer. The new buffer shall be deallocated by using
         free() when it's no longer useful.

         If this parameter is 0 the buffer is swapped in its
         memory area.

    return
        The pointer to the memory area where the bytes has been
        swapped or NULL if an error occurs.
*/
void * swapmem(void *x, size_t len, int retnew)
{
    char *b = NULL, app;
    size_t i;

    if (x != NULL) {
        if (retnew) {
            b = malloc(len);
            if (b!=NULL) {
                for(i=0;i<len;i++) {
                    b[i]=*((char *)x+len-1-i);
                }
            }
        } else {
            b=(char *)x;
            for(i=0;i<len/2;i++) {
                app=b[i];
                b[i]=b[len-1-i];
                b[len-1-i]=app;
            }
        }
    }
    return b;
}

uint64_t swap64(uint64_t k)
{
    return ((k << 56) |
            ((k & 0x000000000000FF00) << 40) |
            ((k & 0x0000000000FF0000) << 24) |
            ((k & 0x00000000FF000000) << 8) |
            ((k & 0x000000FF00000000) >> 8) |
            ((k & 0x0000FF0000000000) >> 24)|
            ((k & 0x00FF000000000000) >> 40)|
            (k >> 56)
           );
}

int main(void)
{
    uint32_t x,*y;
    uint16_t s,z;
    uint64_t k,t;

    x=0xFF567890;

    /* Dynamic allocation is used to avoid to change the contents of x */
    y=(uint32_t *)swapmem(&x,sizeof(x),1);
    if (y!=NULL) {
        printf("LE=%08X BE=%08X\n",x,*y);
        free(y);
    }

    /* Dynamic allocation is not used. The contents of z and k will change */
    z=s=0x7891;
    swapmem(&z,sizeof(z),0);
    printf("LE=%04X BE=%04X\n",s,z);

    k=t=0x1120324351657389;
    swapmem(&k,sizeof(k),0);
    printf("LE=%16"PRIX64" BE=%16"PRIX64"\n",t,k);

    /* LE64 to BE64 (or viceversa) using shift */
    k=swap64(t);
    printf("LE=%16"PRIX64" BE=%16"PRIX64"\n",t,k);

    return 0;
}

After the program was compiled I had the curiosity to see the assembly code gcc generated. I discovered that the function swap64 is generated as indicated below.

00000000004007a0 <swap64>:
  4007a0:       48 89 f8                mov    %rdi,%rax
  4007a3:       48 0f c8                bswap  %rax
  4007a6:       c3                      retq

This result is obtained compiling the code, on a PC with Intel I3 CPU, with the gcc options: -Ofast, or -O3, or -O2, or -Os.

You may solve your problem using something like the swap64() function. A function like the following I've named swap32():

uint32_t swap32(uint32_t k)
{
    return ((k << 24) |
            ((k & 0x0000FF00) << 8) |
            ((k & 0x00FF0000) >> 8) |
            (k >> 24)
           );
}

You may use it as:

uint32_t j=swap32(*(uint32_t *)ptr);

C/C++ code to convert big endian to little endian

6 Answers6

Linked