1

I am using C to read a .png image file, and if you're not familiar with the PNG encoding format, useful integer values are encoded in .png files in the form of 4-byte big-endian integers.

My computer is a little-endian machine, so to convert from a big-endian uint32_t that I read from the file with fread() to a little-endian one my computer understands, I've been using this little function I wrote:

#include <stdint.h>

uint32_t convertEndian(uint32_t val){
  union{
    uint32_t value;
    char bytes[sizeof(uint32_t)];
  }in,out;
  in.value=val;
  for(int i=0;i<sizeof(uint32_t);++i)
    out.bytes[i]=in.bytes[sizeof(uint32_t)-1-i];
  return out.value;
}

This works beautifully on my x86_64 UNIX environment, gcc compiles without error or warning even with the -Wall flag, but I feel rather confident that I'm relying on undefined behavior and type-punning that may not work as well on other systems.

Is there a standard function I can call that can reliably convert a big-endian integer to one the native machine understands, or if not, is there an alternative safer way to do this conversion?

Willis Hershey
  • 1,520
  • 4
  • 22
  • 1
    You can use good ol' shifts for unsigned types. Not sure about signed ones, but it certainly can't be impossible. – Oppen May 21 '20 at 20:26
  • 4
    What about `htonl()` and `ntohl()`? – Fred Larson May 21 '20 at 20:28
  • 1
    I think you mean `ntohl()` – Barmar May 21 '20 at 20:29
  • @Barmar: Yup, got the letters jumbled. – Fred Larson May 21 '20 at 20:31
  • htonl() and ntohl() rely on the `arpa/inet.h` file which is not available on non-UNIX systems – Willis Hershey May 21 '20 at 20:33
  • @FredLarson I think it is making an unnecessary assumption on the endianness of the "network" – Eugene Sh. May 21 '20 at 20:33
  • 2
    Use `uint8_t bytes` instead of `char bytes`. On rare machines where `char` is not 8 bits, code will not compile rather than compile and perform incorrectly. – chux - Reinstate Monica May 21 '20 at 20:34
  • Does this answer your question? [convert big endian to little endian in C \[without using provided func\]](https://stackoverflow.com/questions/2182002/convert-big-endian-to-little-endian-in-c-without-using-provided-func) – Fred Larson May 21 '20 at 20:35
  • @EugeneSh. not really. The naming of the functions is bad, but the specification is clear about it: "network" == "big" for those. And the program itself, OP says its checking the byte order provided by the PNG file first. – Oppen May 21 '20 at 20:36
  • 1
    Note that `convertEndian()` is doing a endian swap and not a "convert a big-endian integer to one the native machine". I'd expect a `big_to_host32()` would be a better approach. – chux - Reinstate Monica May 21 '20 at 20:40
  • 1
    From a naming POV, `ntohl()` implies network-to-long, yet "network" implies "big" even if some attached network protocol used "little" endian and "long" implies 32-bit, even if `long` is 64-bit. I like ones like `be32toh()` better. – chux - Reinstate Monica May 21 '20 at 21:07
  • 1
    Your code doesn't rely on UB, but it does rely on your machine being little-endian – M.M May 21 '20 at 21:34
  • unfortunately there's no compile-time endian detection in Standard C, but GCC does provide predefined macros – M.M May 21 '20 at 21:36

4 Answers4

3

I see no real UB in OP's code.

Portability issues: yes.

"type-punning that may not work as well on other systems" is not a problem with OP's C code yet may cause trouble with other languages.


Yet how about a big (PNG) endian to host instead?

Extract the bytes by address (lowest address which has the MSByte to highest address which has the LSByte - "big" endian) and form the result with the shifted bytes.

Something like:

uint32_t Endian_BigToHost32(uint32_t val) {
  union {
    uint32_t u32;
    uint8_t u8[sizeof(uint32_t)]; // uint8_t insures a byte is 8 bits.
  } x = { .u32 = val };
  return 
      ((uint32_t)x.u8[0] << 24) |
      ((uint32_t)x.u8[1] << 16) |
      ((uint32_t)x.u8[2] <<  8) |
                 x.u8[3];
}

Tip: many libraries have a implementation specific function to efficiently to this. Example be32toh.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • This function does not correctly reverse the endianness of an input, and even if it did, it still accesses an unset member of a union, which is the same potentially hazardous behavior as the code in the original question – Willis Hershey May 24 '20 at 00:12
  • @WillisHershey "it still accesses an unset member of a union" --> is incorrect. What unset member is that? Did you forget `= { .u32 = val };`? This does reverse the endian-ness if "My computer is a little-endian machine was true. – chux - Reinstate Monica May 24 '20 at 02:01
  • I was mistaken, your function does work. The situation I was trying to avoid was using unions to pretend a 4-byte integer was 4 1-byte integers. The only improvements here are replacing char with `uint8_t` and removing the loop, which are steps in the right direction, but don't completely solve the problem – Willis Hershey May 24 '20 at 02:57
  • @WillisHershey There is no pretending, just C specification and [.png](https://en.wikipedia.org/wiki/Portable_Network_Graphics#"Chunks"_within_the_file) compliant code and no UB. .png files have "4-byte big-endian integers" in a particular endian -big. This code converts that faithfully to the local `uint32_t`. What part of the problem do you see as not completely unsolved? – chux - Reinstate Monica May 24 '20 at 03:13
2

IMO it'd be better style to read from bytes into the desired format, rather than apparently memcpy'ing a uint32_t and then internally manipulating the uint32_t. The code might look like:

uint32_t read_be32(uint8_t *src)   // must be unsigned input
{
     return (src[0] * 0x1000000u) + (src[1] * 0x10000u) + (src[2] * 0x100u) + src[3];
}

It's quite easy to get this sort of code wrong, so make sure you get it from high rep SO users . You may often see the alternative suggestion return (src[0] << 24) + (src[1] << 16) + (src[2] << 8) + src[3]; however, that causes undefined behaviour if src[0] >= 128 due to signed integer overflow , due to the unfortunate rule that the integer promotions take uint8_t to signed int. And also causes undefined behaviour on a system with 16-bit int due to large shifts.

Modern compilers should be smart enough to optimize, this, e.g. the assembly produced by clang little-endian is:

read_be32:                              # @read_be32
    mov     eax, dword ptr [rdi]
    bswap   eax
    ret

However I see that gcc 10.1 produces a much more complicated code, this seems to be a surprising missed optimization bug.

M.M
  • 138,810
  • 21
  • 208
  • 365
0

This solution doesn't rely on accessing inactive members of a union, but relies instead on unsigned integer bit-shift operations which can portably and safely convert from big-endian to little-endian or vice versa

#include <stdint.h>

uint32_t convertEndian32(uint32_t in){
  return ((in&0xffu)<<24)|((in&0xff00u)<<8)|((in&0xff0000u)>>8)|((in&0xff000000u)>>24);
}
Willis Hershey
  • 1,520
  • 4
  • 22
0

This code reads a uint32_t from a pointer of uchar_t in big endian storage, independently of the endianness of your architecture. (The code just acts as if it was reading a base 256 number)

uint32_t read_bigend_int(uchar_t *p, int sz)
{
    uint32_t result = 0;
    while(sz--) {
        result <<= 8;   /* multiply by base */
        result |= *p++; /* and add the next digit */
    }
}

if you call, for example:

int main()
{
    /* ... */
    uchar_t buff[1024];
    read(fd, buff, sizeof buff);

    uint32_t value = read_bigend_int(buff + offset, sizeof value);
    /* ... */
}
Luis Colorado
  • 10,974
  • 1
  • 16
  • 31