5

So, my question is pretty simple:

I need to fill a char/unsigned char array with some information. Some values in the middle are taken from short/int types and this is what happens:

Code:

int foo = 15; //0x0000000F
unsigned char buffer[100]={0};

..
memcpy(&buffer[offset], &foo, sizeof(int)); //either memmove
...

Output:

... 0F 00 00 00 ..

So by now I wrote a function to reverse this fields, but I don't find this a smart solution, as it impacts execution time, resources, and time to develop.

Is there an easier way to do it?

Edit: As many of you have pointed, this behaviour is produced due to the little endian processor, but my problem still remains. I need to fill this buffer with int/short values in big-endian, as i need to serialize tha data to be transmitted to a machine which either works in little/big endian, doesn't matter as this protocol is already defined so.

Note: For compiling in C++

Joster
  • 359
  • 1
  • 4
  • 19
  • 8
    [Read about endianness](https://en.wikipedia.org/wiki/Endianness). – Some programmer dude Mar 15 '17 at 14:09
  • 2
    htonl() is a function you can call to reverse the endian-ness of your 32-bit ints, when running on a little-endian system. It's typically implemented in a way that is very efficient (and it's a no-op when running on a big-endian system). – Jeremy Friesner Mar 15 '17 at 14:12
  • Why the C++ tag? It's an unrelated language and provides other ways to do this. – too honest for this site Mar 15 '17 at 14:25
  • 1
    @Olaf The code and question are just as relevant to C++ as C. I think the tag is fair. – JeremyP Mar 15 '17 at 14:29
  • either a C or C++ developer could help me. – Joster Mar 15 '17 at 14:34
  • The functions are C functions, the mechanism is C. A C++ programmer should use other features. And tags are not to attract max. attention. Tag-spam is not well received. – too honest for this site Mar 15 '17 at 14:40
  • 1
    @Joster: from your edit, you seem to understand the issue of endianness. Use the method exposed in my answer on both ends of the transmission. Messing with `ntoh` macros is error prone as you will need an intermediary variable since you should not attempt to store the *reversed* value directly into the buffer at a potentially unaligned address. These expressions tend to compile efficiently with modern compilers. – chqrlie Mar 15 '17 at 14:41
  • @Olaf: I agree a C++ programmer should not use `memcpy`, but he could use the shift expressions. – chqrlie Mar 15 '17 at 14:42
  • @Olaf considering the questioner is actually asking *what the other methods are*, that does not disqualify C++ unless he is unwilling to compile the code as C++. With the C++ tag we have to assume that he is willing to compile the code as C++ – JeremyP Mar 15 '17 at 14:55
  • @JeremyP: That would make the question too broad. We are a Q&A site, not a consulting site. It also was reason for DV for not showing research effort. Feel free to ask on meta. Although I'm pretty sure it will be closed as dup of a dup of a dup... – too honest for this site Mar 15 '17 at 15:01
  • @JeremyP the code/question may be C/C++ applicable, but the best answers may differ from C and C++ and OP has said "For compiling in C++", so only the C++ should exist. – chux - Reinstate Monica Mar 15 '17 at 15:15
  • "I need to fill this buffer with int/short values in big-endian" & "For compiling in C++" --> then why accept an answer (a good C one) that does not use C++ function overloading for `int` and `short`? – chux - Reinstate Monica Mar 15 '17 at 15:20
  • So the OP has just added "for compiling in C++", this means that somebody should post a C++ solution and the OP should change his acceptance to that answer. – JeremyP Mar 15 '17 at 15:31
  • Although, of course, the question is now a duplicate of this http://stackoverflow.com/questions/105252/how-do-i-convert-between-big-endian-and-little-endian-values-in-c – JeremyP Mar 15 '17 at 15:32

4 Answers4

9

It's because the processor architecture you use is little endian. Multibyte numbers (anything bigger than a uint8_t) are stored with the least significant byte at the lowest address.

Edit

What you do about it really depends on what the buffer is for. If you are only going to be using the buffer internally, forget about byte swapping, you'll have to do it in both directions and its a waste of time.

If it is for some external entity e.g. a file or a network protocol, the specification of the file or network protocol will say what the endianness is. For example, network byte order for all the Internet protocols is effectively big endian. The networking library provides a family of functions to convert values for use in sending and receiving Internet protocol messages. Se for instance

https://linux.die.net/man/3/htonl

If you want to roll your own, the portable way is to use bit shifts e.g.

void writeUInt32ToBufferBigEndian(uint32_t number, uint8_t* buffer)
{
    buffer[0] = (uint8_t) ((number >> 24) & 0xff);
    buffer[1] = (uint8_t) ((number >> 16) & 0xff);
    buffer[2] = (uint8_t) ((number >> 8) & 0xff);
    buffer[3] = (uint8_t) ((number >> 0) & 0xff);
}
JeremyP
  • 84,577
  • 15
  • 123
  • 161
  • @chqrlie More haste less speed – JeremyP Mar 15 '17 at 14:52
  • thanks for the help. I will mark your solution as correct although I need a more general one. One that serves for int/short or even structs with bitfileds. I already have a pretty efficient one, and all i wanted was to know if it's really necessary to swap bytes in such situation. – Joster Mar 15 '17 at 15:19
  • The example code I gave can easily be modified for int and short. Doing structs will necessarily require a roll your own method for each struct type. – JeremyP Mar 15 '17 at 15:27
8

Neither memcpy, nor memmove reverse data when copying objects. The byte values you observe when dumping the character array correspond to the way the 32-bit value 15 (0F in hexadecimal) is stored in memory on your environment.

Its appears to be in little endian order, 0F 00 00 00, which is very common in desktop and laptop computers. Other systems, such as many smartphones, might store integer values in big-endian order, 00 00 00 0F, which you consider more natural, but both methods are equally correct. It is just a matter of convention. Little-endian order means the byte with the lowest value bits is stored first, while big-endian is the opposite: the byte with the highest value bits is stored first.

A comprehensive article on Wikipedia covers this subject in depth.

In your application, you must specify in which order the binary value is expected to be stored, and if you decide on big-endian, I suggest you use this code for portability across environments:

#include <stdint.h>

int foo = 15; //0x0000000F
unsigned char buffer[100] = { 0 };

...
buffer[offset + 0] = ((uint32_t)foo >> 24) & 0xFF;
buffer[offset + 1] = ((uint32_t)foo >> 16) & 0xFF;
buffer[offset + 2] = ((uint32_t)foo >>  8) & 0xFF;
buffer[offset + 3] = ((uint32_t)foo >>  0) & 0xFF;
...
chqrlie
  • 131,814
  • 10
  • 121
  • 189
2

On x86 architecture integers in memory are little endian. The lowest byte first. e.g. 0x12345678 will be 78, 56, 34, 12 in memory.

1

The "easier way" is to stop calling it "reversed". Why, really? 0F is the least-significant part of the multi-byte value and you see it stored at the "less-significant" (i.e. lower) address. Looks perfectly consistent and natural to me. Why would you call it "reversed"?

The only thing that looks "reversed" here is that "strange" original notation of yours 0x0000000F in the comments, where you "for some reason" recorded the bytes in right-to-left order: least significant on the right, more significant on the left.

In other words, the reversal here is entirely product of your perception/imagination. You, humans, write numbers in right-to-left order but at the same time output bytes (and write C programs) in left-to-right order. The inconsistency between the two is what is creating the illusion of reversal in such situations.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • i find some incoherence in the fact that if i copy the bytes of a string like "here is my value:", and afterwards an integer value, everything in the same buffer, on the output every byte matches the letter's order except the number. But it may be my perception. – Joster Mar 15 '17 at 14:33
  • @Joster: Um... A string? I don't understand what you are trying to say. If your number is represented as a *string*, as a part of a larger *string*, nothing will be reversed after copying. As for your original example (binary representation), the illusion of reversal comes from your everyday habit to write strings left-to-right and to write numbers "right-to-left" (as explained above). – AnT stands with Russia Mar 15 '17 at 14:35
  • 1
    @Joster: The order of digits in a number is a cultural bias. For example `42` is written `4` `2` but pronounced `zwei und vierzig` in German, (two and forty). Note also that Arabic is read from right to left, yet the numbers are written in the same *order* as in English, appearing as big-endian to an Arabic reader. – chqrlie Mar 15 '17 at 14:37
  • 1
    @chqrlie Sample inconsistency in English: 41 --> "forty-one" (big endian text) and 14 --> "Fourteen" (little endian). ;-) – chux - Reinstate Monica Mar 15 '17 at 15:10
  • 1
    @chux: good point! Even singular and plural are culturally biassed: 0.4 seconds in English and 0,4 seconde in French. – chqrlie Mar 18 '17 at 20:20