0

In a (real time) system, computer 1 (big endian) gets an integer data from from computer 2 (which is little endian). Given the fact that we do not know the size of int, I check it using a sizeof() switch statement and use the __builtin_bswapX method accordingly as follows (assume that this builtin method is usable).

...
int data;
getData(&data); // not the actual function call. just represents what data is.
...
switch (sizeof(int)) {
case 2:
    intVal = __builtin_bswap16(data);
    break;
case 4:
    intVal = __builtin_bswap32(data);
    break;
case 8:
    intVal = __builtin_bswap64(data);
    break;
default:
    break;
}
...

is this a legitimate way of swapping the bytes for an integer data? Or is this switch-case statement totally unnecessary?

Update: I do not have access to the internals of getData() method, which communicates with the other computer and gets the data. It then just returns an integer data which needs to be byte-swapped.

Update 2: I realize that I caused some confusion. The two computers have the same int size but we do not know that size. I hope it makes sense now.

erol yeniaras
  • 3,701
  • 2
  • 22
  • 40
  • 5
    Sizeof returns bytes, not bits – Ctx Jan 19 '16 at 21:32
  • 1
    How is that related to real-time? Also that is a gcc C extension in the first place. And **please** read what the `sizeof` operator actually is for. – too honest for this site Jan 19 '16 at 21:32
  • 3
    Is the size of an `int` the same on both platforms? – Karoly Horvath Jan 19 '16 at 21:33
  • 1
    Why do you think using gcc **extensions** would be **standard** compliant? – too honest for this site Jan 19 '16 at 21:34
  • 5
    @Olaf No one was thinking that. You're the only one bringing standard compliance into this. –  Jan 19 '16 at 21:35
  • If that is about exchange data between two systems, use a common exchange format and proper serialisation from e.g. an octet-buffer on both sides. Good practice is no side has to care about properties of the other - except for the communication link, of course. – too honest for this site Jan 19 '16 at 21:35
  • All..Sorry for the sizeof error! – erol yeniaras Jan 19 '16 at 21:36
  • @hvd: I don't think so, my experience has been that every C++ question on SO is assumed to be about the standards, unless explicitly stated otherwise. If op only wants code that works on gcc he could say so – Chris Beck Jan 19 '16 at 21:37
  • 2
    @hvd: The question clearly askes if it is legitimate without information what he thinks that actually means. As the question is tagged for two different languages, one has to assume it is about the standards. – too honest for this site Jan 19 '16 at 21:41
  • Too much assumptions and relying on compiler internals - no good –  Jan 19 '16 at 21:41
  • @KarolyHorvath, I don't know, I tend to agree with the language lawyers on this. It's important when people are reading answers that they know when the question is about C, C++, or MSVC-only or gcc only, etc. If the question is tagged as [gcc], for example it makes it easy for people to figure out. Also a lot of questions don't really have clear / authoritative answers if its not based on the standard. – Chris Beck Jan 19 '16 at 21:41
  • 3
    How are you transfering the int between the machines if you do not know its size? – Doug Currie Jan 19 '16 at 21:43
  • 2
    @Olaf When multiple interpretations are possible, try to pick an interpretation that makes sense. "Legitimate" has multiple meanings but it's pretty clearly not used as "conforming to a particular ISO specification" in the context of this question. –  Jan 19 '16 at 21:43
  • 1
    The real problem with the code in the question is that it assumes that an `int` has the same size on both computers. The communication protocol used between the two computers should be explicit about both the size and endianness of every item sent. – user3386109 Jan 19 '16 at 21:44
  • @hvd: Feel free to provide a more appropriate interpretation from the information given! – too honest for this site Jan 19 '16 at 21:44
  • @user3386109: See some comments earlier. That's exactly what I wrote. – too honest for this site Jan 19 '16 at 21:45
  • @Olaf Okay: a perfectly sensible interpretation would be that the question is asking if the code is correct. Correct as in supported by the particular compiler that the OP is using, as in giving the OP the results the OP is looking for. What the standards call a "conforming" program, rather than what the standards.call a "strictly conforming" program. –  Jan 19 '16 at 21:47
  • @hvd: I might have overlooked, but where actually does the question **before the last edit** state which compiler is used **and** if extensions are allowed or not? – too honest for this site Jan 19 '16 at 21:49
  • @ChrisBeck: *most* OPs aren't even aware that there is a (are...) standard(s). While I agree that a tag should be there, I strongly disagree that questions should be interpreted with that assumption. *Most* of the time that assumption is just wrong. – Karoly Horvath Jan 19 '16 at 21:49
  • *"We do not know the size of int in advance in neither machine."* So what is your strategy when the transmitted int value won't fit the local int range? – Weather Vane Jan 19 '16 at 21:50
  • @Olaf: where does it mention the standard? :) – Karoly Horvath Jan 19 '16 at 21:50
  • @KarolyHorvath: after re-reading [this meta post](http://meta.stackoverflow.com/questions/281197/re-tagging-c-questions-as-c) I guess that I am wrong, and I'm not supposed to assume that [c++] tag refers to any standard -- I guess we are only supposed to do that when they use a standard tag like [c++11] or something? – Chris Beck Jan 19 '16 at 21:50
  • 1
    @Olaf It didn't, so why were you assuming that extensions weren't allowed? Again, when multiple interpretations are possible and some make sense, some are just plain ridiculous, try not to attack the post based on such a ridiculous interpretation. –  Jan 19 '16 at 21:52
  • @ChrisBeck: No idea. I just use common sense. Novice users have no idea about these issues, and should be informed gently. – Karoly Horvath Jan 19 '16 at 21:52
  • Friends, I apologize if I cause confusion. getData() method deals with the connection to other computer and gives me the integer data (to computer 1). I do not have access to the internals of getData(). All I know is that the data is from a little endian machine and my machine is big endian so all I need is to swap the bytes. Am I thinking wrong? – erol yeniaras Jan 19 '16 at 21:53
  • @KarolyHorvath: Sorry, but then there is no basis to qualify code as correct or not at all. We generally have to assume some common denominator and for C and C++ there - luckily - are international and well-established standards, not just norms (caution for German readers: two false friends!). And the info pages here provide information about he standards. Admittedly, the reference should be more clear, but still there is. – too honest for this site Jan 19 '16 at 21:54
  • @ChrisBeck: Without a specific common basis, how would we judge code is correct or not then? How define UB? I strongly disagree not assuming a standard just because there is not a _specific_ standard is given. A specific tag for e.g. C90 is actually good to refence a specific version, while the `C` tag itself should reference the current standard. But I agree the info-page and the popup should make that clear. – too honest for this site Jan 19 '16 at 21:54
  • 2
    @Olaf: You're *right* and at the same time horribly wrong. You're out of touch with *reality*. Read my other comments ;) – Karoly Horvath Jan 19 '16 at 21:55
  • @Olaf, in that meta post, consensus answer reads like `If someone tags a question as C++, then they are intending to write and compile C++ code. Even if the code is horrible, and they have likely compiled it using a copy of UnicornsC++Compiler that does not follow the standard of C++, they still want an answer that makes it work in C++. Retagging it to C is not helpful.` So I guess when the question is ambiguous we are supposed to encourage them to tag as [c++11] or whatever version they target, but not assume that just because it is [c++] tag – Chris Beck Jan 19 '16 at 21:56
  • @erolyeniaras Okay, so you know / can assume already that `getData()` works, and works well? And you only need something that works on your current system while connecting to that other system? Then you *do* know the size of `int`, don't you? It's whatever size `int` has on your system. Which matches the size of `int` on that other system. Because anything else would mean that `getData()` cannot work properly. Or am I misunderstanding you? –  Jan 19 '16 at 21:58
  • @erol yeniaras, if `getData()` method deals with the connection, then it must know the size and endianess to work correctly. E.g., what if the remote has a 4-byte `int` and local has 2-byte `int`? Which two bytes are discarded? Without knowing endianess how can `getData()` decide? – Doug Currie Jan 19 '16 at 21:58
  • Concerning comment ["all I need is to swap the bytes"](http://stackoverflow.com/questions/34887269/changing-the-endiannes-of-an-integer-which-can-be-2-4-or-8-bytes-using-a-switch/34887561#comment57513703_34887269) - This is wrong if the `int` size is not known. **Both** data size and endian is needed. – chux - Reinstate Monica Jan 19 '16 at 21:59
  • @KarolyHorvath: Everyday usage proves that claim wrong (and one could very well interpret it as offending). While you are right novice users are often not aware a standard exists, they tend to very well accept it once pointed to it. Even more students or novices, etc. which have to fight against ignorant tutors, co-workers, etc. – too honest for this site Jan 19 '16 at 22:00
  • @ChrisBeck: That is about re-tagging which cound not be done here, as OP refuses to clarify. It is not about assuming standard compiance for the normal tags. I still got no answer for the question what to assume correct if no standard or other common basis can be assumed. So it comes back to K&R again? - Hopefully not! – too honest for this site Jan 19 '16 at 22:03
  • @chux: Fully agreed. I already left a comment about that. How the actual swap is done is a secondary task which can be implemented by a student. – too honest for this site Jan 19 '16 at 22:04
  • @hvd: I am getting the `int` size of my system using sizeof() since my software might be running on a 16, 32 or 64 bits system. getData() is reading the data from a network buffer and fills the integer value without us knowing the specifics. – erol yeniaras Jan 19 '16 at 22:05
  • @Olaf: I mean if not enough can be assumed and there's no way to answer, then I guess we should tell them they need to specify what standard / compiler, or its too broad and close it. But it seems that we might not be supposed to assume that everyone is targetting a standard (or *which* standard?) Also in the case of this question, OP is clearly not targetting a standard. So idk if its good to try to shoe-horn his question into one about the standards. – Chris Beck Jan 19 '16 at 22:07
  • @erolyeniaras: Guess what happens if you use that code and connect a 16 and a 64 bit machine? – too honest for this site Jan 19 '16 at 22:09
  • @erolyeniaras Does that mean that on a system with 16 bit `int`, `getData()` reads 2 bytes from the network buffer, but on a system with 64 bit `int`, that same single call reads 8 bytes? When connected to the same remote system? –  Jan 19 '16 at 22:09
  • @ChrisBeck: Sorry, but then thre is no basis to tell a user some behaviour is undefined, or unspecified or implementation defined. Guess what happens if you first have to ask which standard, compiler, compiler-options (`gcc -std=c11` vs. `-std=gnu11`), compiler version (for the default standard: gcc pre-5 or post-5 use c90 resp. c11), etc. That will create even more confusion to the novices than just telling them what the standard says. (And the linked post is not about standards anyway). – too honest for this site Jan 19 '16 at 22:14
  • @erolyeniaras Then sorry, but I'm having a bit of trouble seeing how this would work. If the remote server sends 64 bits, then when a client with 16-bit `int` reads it, it would read it as four `int` objects. Yet when a client with 64-bit `int` reads that same data, it would read it as a single `int` object. You do want to treat the data the same way on both clients, right? If you do, then wouldn't you need to join or split the bytes first until you have whatever number of bytes the server uses, and *then* byteswap, regardless of how large `int` is on the client? –  Jan 19 '16 at 22:23
  • Ok, now it doesn't have any sense. I was reading the question in context of size of int being always the same on both machines. That was actually similar to something I used to work with. When the size of int is different, this whole code is meaningless, and I rest my case. – SergeyA Jan 19 '16 at 22:27
  • @Olaf and others, I opened a meta thread here, please share your thoughts: http://meta.stackoverflow.com/questions/315066/suggestion-add-a-note-to-c-tag-explaining-how-best-to-use-it – Chris Beck Jan 19 '16 at 23:21

4 Answers4

4

Seems odd to assume the size of int is the same on 2 machines yet compensate for variant endian encodings.

The below only informs the int size of the receiving side and not the sending side.

switch(sizeof(int))

The sizeof(int) is the size, in char of an int on the local machine. It should be sizeof(int)*CHAR_BIT to get the bit size. [Op has edited the post]

The sending machine should detail the data width, as a 16, 32, 64- bit without regard to its int size and the receiving end should be able to detect that value as part of the message or an agreed upon width should be used.

Much like hton() to convert from local endian to network endian, the integer size with these function is moving toward fixed width integers like

#include <netinet/in.h>

uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);

So suggest sending/receiving the "int" as a 32-bit uint32_t in network endian.


[Edit]

Consider computers exist that have different endian (little and big are the most common, others exist) and various int sizes with bit width 32 (common), 16, 64 and maybe even some odd-ball 36 bit and such and room for growth to 128-bit. Let us assume N combinations. Rather than write code to convert from 1 of N to N different formats (N*N) routines, let us define a network format and fix its endian to big and bit-width to 32. Now each computer does not care nor need to know the int width/endian of the sender/recipient of data. Each platform get/receives data in a locally optimized method from its endian/int to network endian/int-width.

OP describes not knowing the the sender's int width yet hints that the int width on the sender/receiver might be the same as the local machine. If the int widths are specified to be the same and the endian are specified to be one big/one little as described, then OP's coding works.

However, such a "endians are opposite and int-width the same" seems very selective. I would prepare code to cope with a interchange standard (network standard) as certainly, even if today it is "opposite endian, same int", tomorrow will evolved to a network standard.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • getData() method deals with the connection to other computer and gives me the integer data (on local machine). All I know is that the data is from a little endian machine and my machine is big endian so all I need is to swap the bytes. – erol yeniaras Jan 19 '16 at 21:52
  • 2
    @erol yeniaras So how does `getData()` know the width of an `int` on the sending machine? Without that the information, `switch(sizeof(int))` is of no value. "we do not know the size of int in advance in neither machine," – chux - Reinstate Monica Jan 19 '16 at 21:55
  • @erolyeniaras: You have to define the transmission format for _both_ sides. You must not depend on properties of one side. – too honest for this site Jan 19 '16 at 22:08
  • @Olaf: how can I define the transmission format of the other machine if I do not have control over it? The other machine is like a black box to me. All I see is getData(&int) method. – erol yeniaras Jan 19 '16 at 22:13
  • @erolyeniaras Who wrote the `getData` function? – user3386109 Jan 19 '16 at 22:17
  • @user3386109: not me. I promise :) – erol yeniaras Jan 19 '16 at 22:19
  • @erolyeniaras: Wenn ich eine andere Sprache spreche, wie willst du mich dann verstehen? - Did you understand the sentence before? If not: that is exactly the problem you are facing. If you do not use a common data format (i.e. "language") on both sides, you will not be able to communicate - at best. Atr worst, there will be miss understanding and malinterpretation. First thing for communication systems is to define common protocols at every level (see "OSI layers"). Both sides have to comply to that. (Dirst sentence translated: "If I speak a different language, how do you understand me?") – too honest for this site Jan 19 '16 at 22:20
  • I am not sure why all the conondrum about the size of int. It is conceivable that two machines have the same for int, and always the same, and it is part of the requirements for the system) but endian is different. In fact, I worked in such env myself. – SergeyA Jan 19 '16 at 22:21
  • @SergeyA: OP already stated there will be from 16 to 64 bit machines. And how do you get that `int` always has the same size? That is just wrong - standard left aside, there are even C compilers with 8 bit `int`. – too honest for this site Jan 19 '16 at 22:24
  • @Olaf, the whole question had nothing to do with standard from the very beginning. Endiannes is not part of the standard. And you get the same size of int by simply stating requirement to your environment. As simple as that. – SergeyA Jan 19 '16 at 22:26
  • @SergeyA: You don't program embedded systems. don't you? Please show the option to tell gcc which bitwidth `int` has. I really could use that. – too honest for this site Jan 19 '16 at 22:29
  • @Olaf, I do not, but how is it relevant? – SergeyA Jan 19 '16 at 22:30
  • @Olaf: What you are saying makes sense, however in my case can we assume that "getData(int& data) is a method that reads a common buffer (between two computers) and for some reason it creates int data in reverse byte order and I need to convert it to big endian without knowing my systems int size in advance? – erol yeniaras Jan 19 '16 at 22:32
  • @SergeyA: Seriously and without offence: You really would talk different if you did. There are good reasons C (and C++) has become that widely used in such a broad area, from 8 bit MCUs to 64 bit clusters. The most important is its flexibility, including the widths and encodings of the standard types. – too honest for this site Jan 19 '16 at 22:34
  • @Olaf, you are not hearing me. C++ is used in OTHER places as well, having nothing to with embedded development. It is perfectly fine for programms in such environments have some restrictions. For instance, they might require a certain size of int. I assumed, this is the case with OP, because it is part of my background. – SergeyA Jan 19 '16 at 22:36
  • @erolyeniaras: First make clear which language you are actually using. As stated multiple times, C and C++ are different languages and you should/can not use some techniques of one in the other. Second, see DirkHerrmann's answer for a C implementation. While it might need some finish, it is a good starter (just read the comments, too). – too honest for this site Jan 19 '16 at 22:36
  • @SergeyA: "they might require a certain size of int" Well, the width and encoding of `int` is _always_ implementation defined. The same for the other types. For example, POSIX64 (e.g. Linux and Unix) requires a `long` to be 64 bits. On Windows64, it is 32 bits. There were (possibly still are - I lost contact) systems which have 64 bit `int`. etc. That is not only a matter of embedded systems, but all implementations. – too honest for this site Jan 19 '16 at 22:40
  • @Olaf: I have read all the comments. The system has both C and C++ code running on it. But This particular code is in a c file. Assuming the machines can have different int size was not sensible, I get it. The only difference between them is endianness. does this make sense? – erol yeniaras Jan 19 '16 at 22:47
  • @erolyeniaras: As that is about C, would you please remove the C++ tag then? And no, assuming they can have different sizes of `int` is perfectly sensible! Just writing a function which uses the local size is nonsense. Use the fixed-width types on both machines, define the transfer format (size, encoding and endianess) and implement the functions for serialisation and deserialisation for both sides. If done properly, that is completely independent of endianess, but has to observe encoding iff(!) you have to use signed integers (not recommended). – too honest for this site Jan 19 '16 at 22:48
  • @Olaf: That is an excellent answer which clears many things! Thanks! I removed the c++. Actually, the `data` is of type `int` but it never gets a negative value (it is the size of some array), so I guess using unsigned integer of fixed size might have been more useful to transfer that piece of information from C2 to C1. – erol yeniaras Jan 19 '16 at 22:58
  • @erolyeniaras: Do not see the data as `int`, but as e.g. `int32_t`, i.e. four octets/`uint8_t`. Please do some research about `stdint.h` in C. You will need it. And in fact that is the only rasonable approach. I don't earn my money just for smart looking ;-) – too honest for this site Jan 19 '16 at 23:11
  • @erolyeniaras The following post may be useful: http://stackoverflow.com/q/20077313/2410359 – chux - Reinstate Monica Jan 19 '16 at 23:12
2

A portable approach would not depend on any machine properties, but only rely on mathematical operations and a definition of the communication protocol that is also hardware independent. For example, given that you want to store bytes in a defined way:

void serializeLittleEndian(uint8_t *buffer, uint32_t data) {
    size_t i;
    for (i = 0; i < sizeof(uint32_t); ++i) {
        buffer[i] = data % 256;
        data /= 256;
    }
}

and to restore that data to whatever machine:

uint32_t deserializeLittleEndian(uint8_t *buffer) {
    uint32_t data = 0;
    size_t i;
    for (i = 0; i < sizeof(uint32_t); ++i) {
        data *= 256;
        data += buffer[i];
    }
    return data;
}

EDIT: This is not portable to systems with other than 8 bits per byte due to the uses of int8_t and int32_t. The use of type int8_t implies a system with 8 bit chars. However, it will not compile for systems where these conditions are not met. Thanks to Olaf and Chqrlie.

Dirk Herrmann
  • 5,550
  • 1
  • 21
  • 47
  • Formally there still is a problem with `sizeof`: it returns the number of bytes, not octets. Note that a byte is not necessary 8 bits. But as we apparently do not care about the standard anymore, feel free to ignore my comment. – too honest for this site Jan 19 '16 at 22:06
  • OTOH, as you use `uint8_t`, you are very safe to just use `4` instead of `sizeof`. – too honest for this site Jan 19 '16 at 22:06
  • 2
    `uint8_t` and `uint32_t` are only available on environments where these types are exactly 8 resp 32 bits. On such an architecture, `char` is necessarily `8` bits and `sizeof(uint32_t)` equals `4`. – chqrlie Jan 19 '16 at 22:10
  • @Olaf You are correct. The only people who care about systems where a byte is not 8 bits is people who have systems where a byte is not 8 bits. And that is a very small number of people indeed. – user3386109 Jan 19 '16 at 22:21
  • @user3386109: Still no reason to ignore it without need. Anyway, you might have noticed I absolutely have no problem with fixed-width types from `stdint.h`. They exist for very good reasons. I might try sticking to the standard where reasonable, but I very well live (and work) in the real world. – too honest for this site Jan 19 '16 at 22:27
  • Thanks a lot for these hints - I tried to improve the answer based on your comments. And while the code is (hopefully) formally correct, it is not any more portable than before, which means that it will still not work on systems with, say, 10 bit bytes. That is probably the sad thing for those with unusual systems: There is not more code working for them, only more code that already fails compiling (at least). – Dirk Herrmann Jan 19 '16 at 22:29
  • @DirkHerrmann: Well, if it compiles, it will also work. Although I have no idea how these systems can provide `uint8_t` without 8 bit bytes, while still complying to the standard ;-). My comment was more about sometimes `sizeof` is useless and magic numbers are very well acceptable. But I prefer a generic function, where I just pass the number of octets. For a communication link, that is actually more important than the width of the result type. – too honest for this site Jan 19 '16 at 22:47
  • Oddly enough, this doesn't compile into `bswap` for x86 with GCC or Clang. Here's *portableish* [code](http://goo.gl/2Snl4I) that does compile to `bswap` with Clang though. – Jason Jan 19 '16 at 22:53
  • @Jason: Feel free to provide an answer. That link does not work. Anyway, optimisation is very certainly not a problem OP currently has or should care about. Even more, as that seems to run on different architectures. Optimisation for one arch might generate bad code on a second. Aslo, there is no need for a byteswap, as that is proper serialisation. – too honest for this site Jan 19 '16 at 22:54
  • @Olaf Odd. I posted another link in an answer. – Jason Jan 19 '16 at 23:15
  • @Jason: Yes. normally reconq works with most sites. Anyway, that was not important enough for me to start Firefox. However; I read your answer. Sorry, but this one is fine for me. I only would change to shifts & bitops, casts (to silence some warnings) and add some fine-tuning. – too honest for this site Jan 19 '16 at 23:22
  • @Olaf The code is *portableish*, and correct which is what's most important, but it doesn't optimize well. I honestly wish the C and C++ standards would just add it. Not everyone has POSIX, and hand coding a `bswap` is error prone and typically somewhere critical performance can easily be lost. – Jason Jan 19 '16 at 23:29
  • @Jason: Just to repeat: That is not for byteswap, but (de)serialisation. Please read the comments; this is an XY-problem. – too honest for this site Jan 19 '16 at 23:42
  • @Olaf I read the comments before I posted. X was largely already addressed. – Jason Jan 20 '16 at 00:10
0

Yes, this is totally cool - given you fix your switch for proper sizeof return values. One might be a little fancy and provide, for example, template specializations based on the size of int. But a switch like this is totally cool and will not produce any branches in optimized code.

SergeyA
  • 61,605
  • 5
  • 78
  • 137
  • Questions asks if that is a legitimate, i.e. standard compliant way. It certainly is not! – too honest for this site Jan 19 '16 at 21:34
  • 3
    @Olaf, this is standard compliant. Is there undefined behaviour? No, there is none. Is the program malformed? No, it is not (I take sizeof() return value to be an artifact of SO question). – SergeyA Jan 19 '16 at 21:35
  • I have encountered compilers (gcc based!) for some platforms not supporting `__builtin_bswap*`. I think TI's ARM compiler is not supporting it.. – Eugene Sh. Jan 19 '16 at 21:36
  • 1
    @SergeyA While I agree that it is standard compliant it is nevertheless senseless. If the two communicating nodes have differing sizes of int, decoding will fail anyway. So it must be determined by the protocol which (fixed) size to use for numbers – Ctx Jan 19 '16 at 21:46
  • Please provide a link where `int` has a defined size. And to the standard's definitions of the `__builtin_bswapN` functions. Until then, it clearly is **not**. – too honest for this site Jan 19 '16 at 21:47
  • @Olaf Using extensions makes the program not *strictly* conforming, but it *is* still conforming. – edmz Jan 19 '16 at 21:48
  • 1
    Not sure what you mean by "totally cool" exactly but the code will not work if the sender had a different value for `sizeof(int)` than the receiver – M.M Jan 19 '16 at 21:48
  • @user3386109: The compiler will optimize out the switch logic. It is cumbersome to do this with the preprocessor as `sizeof(int)` cannot be used in a preprocessor `#if` directive, you would need to use `INT_MAX` and hardcoded values. – chqrlie Jan 19 '16 at 22:05
  • @user3386109 The optimization is referred to as [constant folding](https://en.wikipedia.org/wiki/Constant_folding). – Jason Jan 19 '16 at 22:09
  • The `getData` function cannot possibly work, unless it knows the sizes and endianness on both machines, so the code in the question is not totally cool. It is, in fact, nonsense. – user3386109 Jan 19 '16 at 22:15
  • @M.M., I am not sure why all the conondrum about the size of int. It is conceivable that two machines have the same for int, and always the same, and it is part of the requirements for the system) but endian is different. In fact, I worked in such env myself. – SergeyA Jan 19 '16 at 22:21
  • @user3386109: May be getData knows stuff that I do not know. – erol yeniaras Jan 19 '16 at 22:23
  • @erolyeniaras Maybe you need to find out how `getData` works. – user3386109 Jan 19 '16 at 22:23
  • @SergeyA the question says "we do not know the size of int in advance in either machine " . It doesn't say "we know both machines have the same size int" – M.M Jan 19 '16 at 22:25
  • 1
    @M.M, I read the question on the first edit, when it didn't state this clearly. Now I see no sense in the question. – SergeyA Jan 19 '16 at 22:30
  • @SergeyA: You assumption was correct. "we do not know the size of int in advance in either machine" is not very clear. I agree. – erol yeniaras Jan 19 '16 at 22:54
0

As already mentioned, you generally want to define a protocol for communications across networks, which the hton/ntoh functions are mostly meant for. Network byte order is generally treated as big endian, which is what the hton/ntoh functions use. If the majority of your machines are little endian, it may be better to standardize on it instead though.

A couple people have been critical of using __builtin_bswap, which I personally consider fine as long you don't plan to target compilers that don't support it. Although, you may want to read Dan Luu's critique of intrinsics.

For completeness, I'm including a portable version of bswap that (at very least Clang) compiles into a bswap for x86(64).

#include <stddef.h>
#include <stdint.h>

size_t bswap(size_t x) {

  for (size_t i = 0; i < sizeof(size_t) >> 1; i++) {

    size_t d = sizeof(size_t) - i - 1;

    size_t mh = ((size_t) 0xff) << (d << 3);
    size_t ml = ((size_t) 0xff) << (i << 3);

    size_t h = x & mh;
    size_t l = x & ml;

    size_t t = (l << ((d - i) << 3)) | (h >> ((d - i) << 3));

    x = t | (x & ~(mh | ml));
  }

  return x;
}
Jason
  • 3,777
  • 14
  • 27
  • Not sure what you try to accomplish. The code is much more complicated that the one in the other answer. Also it uses wrong types. OP clearly requires fixed-width types. This is no application for `size_t`. – too honest for this site Jan 19 '16 at 23:18
  • @Olaf It's a *portableish* byteswap that optimizes correctly on at least one platform with at least one compiler. The code isn't that hard to read. I copied the code from one of my C projects, but it's fairly easy to templatize (e.g. `size_t` -> `T`). – Jason Jan 19 '16 at 23:22
  • But the question is about (de)serialisation, not swapping. It is an XY-problem actually. Please see the lastest comments. Even if it was, `size_t` is not appropriate here, as OP clearly needs a defined width type, which `size_t` clearly is not (actually it is worse than `int` for typical implementations, as it really spans from 16 to 64 bits for widely used targets). – too honest for this site Jan 19 '16 at 23:26
  • @Olaf Yes, OP needs a defined protocol, or to define one. Using fixed width types (e.g. `uint32_t`/`uint64_t`) is the cleanest way to do that. – Jason Jan 19 '16 at 23:32