1

Say I have a buffer filled with data and that I got it off the network.

uint8_t buffer[100];

Now imagine that this buffer has different fields. Some are 1 byte, some 2 bytes, and some 4 bytes. All these fields are packed in the buffer.

Now pretend that I want to grab the value of one of the 16 bit fields. Say that in the buffer, the field is stored like so:

buffer[2] = one byte of two byte field
buffer[3] = second byte of two byte field

I could grab that value like this:

uint16_t* p_val;

p_val = (int16_t*) &buffer[2];
or
p_val = (int16_t*) (buffer + 2);

printf("value: %d\n", ntohs(*p_val));

Is there anything wrong with this approach? Or alignment issues I should watch out for?

user1764386
  • 5,311
  • 9
  • 29
  • 42
  • 1
    Two issues: Endianness and alignment. The former is probably more important, as the compiler should handle the latter for you by just throwing in lots more code. But Endianness you have to handle yourself. – Lee Daniel Crocker Dec 02 '14 at 20:53
  • Assuming I got the buffer from an ethernet packet, would the ntohs when dereferencing solve the endianness issue?. – user1764386 Dec 02 '14 at 20:57
  • 1
    There's also [strict aliasing](http://stackoverflow.com/a/99010/274261) – ArjunShankar Dec 02 '14 at 20:58
  • 1
    @user1764386 No, because you don't know the endian-ness of the coding machine. This is data content? – Weather Vane Dec 02 '14 at 20:58
  • @WeatherVane What do you mean by *endian-ness of the coding machine*? – ArjunShankar Dec 02 '14 at 21:00
  • 1
    If you are looking at data content written by a remote computer, how can you tell whether it is coded as MSB-LSB or LSB-MSB? – Weather Vane Dec 02 '14 at 21:01
  • @WeatherVane Depending on the protocol involved, there has to be a standard. For example, IP headers use big-endian. And I'm hoping that any meaningful binary data format mandates an endianness. It would depend on that, not the endian-ness of the machine that sent the packet. – ArjunShankar Dec 02 '14 at 21:05
  • On the other hand, it may be argued that a naive programmer hacking together some networked program on exclusively little/big endian machines might forget to take endian-ness into account and not run into a problem at all. Until they do. – ArjunShankar Dec 02 '14 at 21:07
  • If you get the data from an ethernet packet, then the data format must define the byte order (or else you cannot reliably communicate). That *could be* network byte order, in which case `ntohs()` would be an appropriate way to handle it, but you cannot simply assume network byte order. – John Bollinger Dec 02 '14 at 21:08
  • OK here is why: Suppose someone is transmitting an executable code for an embedded processor via TCP. "The fields might be 1, 2 or 4 bytes." Why would they convert the endian-ness just to satisfy snoopers? – Weather Vane Dec 02 '14 at 21:09
  • I apologize if I caused confusion - I was trying to simplify the problem as much as possible. Say data in the buffer IS in network byte order. My question is more concerned with alignment/aliasing issues than byte order issues. – user1764386 Dec 02 '14 at 21:11
  • The person transmitting would be sure to define an explicit data format (including endianess) and would conform to that format in order to ensure that the reader can receive the transmission correctly. If he is concerned about snoopers then he furthermore encodes the transmission. – John Bollinger Dec 02 '14 at 21:12
  • @WeatherVane Like I said before, it depends on the agreed upon endian-ness and *not* on the machine that "sent the packet". – ArjunShankar Dec 02 '14 at 21:13

1 Answers1

2

As has come out in commentary, yes, there are issues with your proposed approach. Although it might work on the target machine, or it might happen to work in a given case, it is not, in general, safe to cast between different pointer types. (There are exceptions.)

To properly take alignment and byte order into consideration, you could do this:

union convert {
    uint32_t word;
    uint16_t halfword[2];
    uint8_t bytes[4];
} convert;

uint16_t result16;

memcpy(convert.bytes, buffer + offset, 2);

/* assuming network byte order: */
result16 = ntohs(convert.halfword[0]);

If you are in control of the data format, then network byte order is a good choice, as the program doesn't then need explicitly to determine, assume, or know the byte order of the machine on which it is running.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157