0

In the popular blogpost "The byte order fallacy" from Rob Pike which is often linked in discussions about endianness, it is implied in several places that one should avoid byte swapping, and mentioned that it is a sure sign of trouble. The author goes as far as saying

byte-swapping is the surest indicator the programmer doesn't understand how byte order works.

I will now show an example of byte swapping that I stole from this post:

uint64_t swap64(uint64_t k)
{
    return ((k << 56) |
            ((k & 0x000000000000FF00) << 40) |
            ((k & 0x0000000000FF0000) << 24) |
            ((k & 0x00000000FF000000) << 8) |
            ((k & 0x000000FF00000000) >> 8) |
            ((k & 0x0000FF0000000000) >> 24)|
            ((k & 0x00FF000000000000) >> 40)|
            (k >> 56)
           );
}

I would like you to focus on the claim that byte swapping should be avoided, and I would like you to address any alignment or other issues that could emerge while using a function like swap64()

basedchad21
  • 121
  • 7
  • 2
    Re “it is mentioned in several places that one should avoid byte swapping at all cost”: The word “cost” or “costs” do not appear in the linked page. Nor does the word “avoid.” What places are you referring to? Exactly what do they say? – Eric Postpischil Apr 17 '23 at 16:54
  • I thought the giant quote saying that anyone doing byte swaps doesn't know what they are doing would do the trick, but if you want to be so pedantic, I can give you the other two quotes: 1: `6. It swaps the bytes, a sure sign of trouble (see below).` (in the context of the example of bad code being bad) 2: `6. Never "byte swaps".` (in the context of his code being good). You could have also searched for the word "swap" – basedchad21 Apr 17 '23 at 17:00
  • What he's saying is that you shouldn't read from the stream in chunks larger than bytes, then swap the bytes when the stream is the opposite byte order than the CPU. Instead, you should read byte-by-byte from the stream and construct the value with bit-shifting, based on the stream's byte order. – Barmar Apr 17 '23 at 17:03
  • Those do not say to avoid byte swaps at all costs. You have misunderstood the article. It does not even really say to avoid byte swaps, just to avoid unnecessary byte swaps. It says the author sees them being used incorrectly or without sufficient reason. It certainly does not say to avoid byte swaps “at all costs.” There is nothing remarkable here; one should in general avoid unnecessary work of any kind, and the author contends that byte swaps are sometimes being performed unnecessarily. – Eric Postpischil Apr 17 '23 at 17:04
  • But of course you'll find lots of byte-swapping code because not everyone follows guidelines like this. And lots of it is idiomatic, such as the use of functions like `htonl()` in standard networking APIs. – Barmar Apr 17 '23 at 17:04
  • It would imply that if I save a `struct` with multi-byte members to disk, when I read it I must build the multi-byte values byte by byte. – Weather Vane Apr 17 '23 at 17:06
  • @Barmar RE: "Instead, you should read byte-by-byte from the stream and construct the value with bit-shifting, based on the stream's byte order": Why? Isn't it more work to get 4 times 1 byte instead of getting 4 at once? What are the reasons to avoid byte swapping and why is the approach of filling an array of bytes supposed to be better? – basedchad21 Apr 17 '23 at 17:08
  • 1
    @WeatherVane The Photoshop example in the blog indicates why you need to do that if the file is intended to be portable. – Barmar Apr 17 '23 at 17:08
  • @basedchad21 You're presumably reading from a buffer, not making a system call for each character. So "reading byte-by-byte" is simply array indexing. – Barmar Apr 17 '23 at 17:09
  • @Barmar I understood the article. Do you mean if the *code* is intended to be portable? – Weather Vane Apr 17 '23 at 17:10
  • @Hogan Yea, I don't know what they are doing. Anyway, I can't see your stuff, it redirects me to some ibm verification page – basedchad21 Apr 17 '23 at 17:24
  • Sorry put it on work git not public one. Please try this link: https://gist.github.com/hoganlong/c359301ddb72e17fc8777b623ee58ae1 – Hogan Apr 17 '23 at 18:14
  • @Hogan thanks. But I am trying to save data to a file in a different endianness than the host machine, so in that case I think it is legitimate for me to want to swap bytes. I was trying to find out if there are any good reasons not to do it, and I have yet to hear one. – basedchad21 Apr 17 '23 at 18:20
  • @basedchad21 -- is there a good reason to save the data to the file with a different endianness than a native write? The key here is the use case, if you can't give one then you should do it. Why bother making a hard to support feature no one will use. What exactly is your use case for this data file? – Hogan Apr 17 '23 at 18:23
  • The use case is that I want to do it like that for my own education. Is it so weird to have a binary file of a specific endianness, and wanting to save data to it that adheres to that endianness? – basedchad21 Apr 17 '23 at 18:27
  • @basedchad21 -- yes it is weird. This is not something anyone would do in the real world. This is why the question was deleted and why everyone on here is giving you push-back. It is exactly that -- weird. No one would ever do this because there are better ways to solve the problems that actually come up in the real world. – Hogan Apr 17 '23 at 18:42
  • *"I am trying to save data to a file in a different endianness than the host machine"* what you should instead be doing is saving data in a file in a known endianness, independent of the host machine's endianness. – dbush Apr 18 '23 at 21:43

1 Answers1

1

Should one avoid byte swapping (?)

  • Within a system, the byte order usually does not make a difference.

  • Between systems using multi-byte integers and floating point, byte order is important.


Within your own sandbox, avoid byte swapping.

Between systems:

  • Convert to a common endian (e.g. hton32())

  • or tag the endian of the data (.e.g TIFF)

  • or use endian independent data (e.g. UTF-8 text).

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • `avoid byte swapping` yes but why? Can Anyone give me an actual reason? – basedchad21 Apr 17 '23 at 17:12
  • 1
    @basedchad21 1) Unnecessary work 2) Typical brittle code – chux - Reinstate Monica Apr 17 '23 at 17:18
  • 1
    @basedchad21 Because if you need to byteswap you're doing something wrong, most likely not serializing or storing data in a portable manner. – Andrew Henle Apr 17 '23 at 17:23
  • 2
    The referenced article does not in fact claim that the byte order is not important. It claims, that there are portable ways of retrieving the data in native byte-order (which does not matter!) from externally supplied data. That is, your program is not supposed to care about the byte order your machine is working, but it does need to know about the source of the data. – Eugene Sh. Apr 17 '23 at 17:34
  • @basedchad21 What is the use case for `swap64()` that you have in mind? – chux - Reinstate Monica Apr 17 '23 at 20:20
  • @chux-ReinstateMonica I don't care about this question anymore, and I got my answer about the legitimacy of the claims after I found heaps upon heaps of byteswapswap functions in glibc which are referenced by endianness macros. – basedchad21 Apr 18 '23 at 07:13