Use casts to access a byte-array like a structure?

Question

I'm working on a microcontroller-based software project. A part of the project is a parser for a binary protocol. The protocol is fixed and cannot be changed. A PC is acting as a "master" and mainly transmits commands, which have to be executed by the "slave", the microcontroller board.

The protocol data is received by a hardware communication interface, e.g. UART, CAN or Ethernet. That's not the problem.

After all bytes of a frame (4 - 10, depending on the command) are received, they are stored in a buffer of type "uint8_t cmdBuffer[10]" and a flag is set, indicating that the command can now be executed. The first byte of a frame (cmdBuffer[0]) contains the command, the rest of the frame are parameters for the command, which may differ in number and size, depending on the command. This means, the payload can be interpreted in many ways. For every possible command, the data bytes change their meaning.

I don't want to have much ugly bit operations, but self-documentating code. So my approach is:

I create a "typedef struct" for each command
After determining the command, the pointer to the cmdBuffer is casted to a pointer of my new typedef
by doing so, I can access the command's parameters as structure members, avoiding magic numbers in array acces, bit operations for parameters > 8 bit, and it is easier to read

Example:

typedef struct
{
    uint8_t commandCode;
    uint8_t parameter_1;
    uint32_t anotherParameter;
    uint16 oneMoreParameter;
}payloadA_t;

//typedefs for payloadB_t and payloadC_t, which may have different parameters

void parseProtocolData(uint8_t *data, uint8_t length)
{
  uint8_t commandToExecute;

  //this, in fact, just returns data[0]
  commandToExecute = getPayloadType(data, length);

  if (commandToExecute == COMMAND_A)
  {
    executeCommand_A( (payloadA_t *) data);
  }
  else if (commandToExecute == COMMAND_B)
  {
    executeCommand_B( (payloadB_t *) data);
  }
  else if (commandToExecute == COMMAND_C)
  {
    executeCommand_C( (payloadC_t *) data);
  }
  else
  {
    //error, unknown command
  }
}

I see two problems with this:

First, depending on the microcontroller architecture, the byteorder may be intel or motorola for 2 or 4- byte parameters. This should not be much problem. The protocol itself uses network byte order. On the target controller, a macro can be used for correcting the order.
The major problem: there may be padding bytes in my tyepdef struct. I'm using gcc, so i can just add a "packed"-attribute to my typedef. Other compilers provide pragmas for this. However, on 32-bit machines, packed structures will result in bigger (and slower) machine code. Ok, this may also be not a problem. But I'v heard, there can be a hardware fault when accessing un-aligned memory (on ARM architecture, for example).

There are many commands (around 50), so I don't want access the cmdBuffer as an array I think the "structure approach" increases code readability in contrast to the "array approach"

So my questions:

Is this approach OK, or is it just a dirty hack?
are there cases where the compiler can rely on the "strict aliasing rule" and make my approach not work?
Is there a better solution? How would you solve this problem?
Can this be kept, at least a little, portable?

Regards, lugge

> can be a hardware fault when accessing un-aligned memory - just try it. I used packed structs on MIPS and everything was good, for example. — someuser, Apr 25 '14 at 06:18
Also if you are concerned about the problems with the alignment and do not want to use a "packed"-attribute, you can write a function to parse the raw data into a structure instead of direct casting. — someuser, Apr 25 '14 at 06:30
someuser: thats the approach i try to avoid. Writing parsing code for about 50 commands would be... ugly. Thats way I would like to use the structure casting. This also has the advantage that a member describes with its name what the parameter stands for. — lugge86, Apr 25 '14 at 08:41

score 3 · Answer 1 · answered Apr 25 '14 at 06:59

Generally, structs are dangerous for storing data protocols because of padding. For portable code, you probably want to avoid them. Keeping the raw array of data is therefore the best idea still. You only need a way to interpret it differently depending on the received command.

This scenario is a typical example where some kind of polymorphism is desired. Unfortunately, C has no built-in support for that OO feature, so you'll have to create it yourself.

The best way to do this depends on the nature of these different kinds of data. Since I don't know that, I can only suggest on such way, it may or may not be optimal for your specific case:

typedef enum
{
  COMMAND_THIS,
  COMMAND_THAT,
  ... // all 50 commands

  COMMANDS_N // a constant which is equal to the number of commands
} cmd_index_t;


typedef struct
{
  uint8_t      command;        // the original command, can be anything
  cmd_index_t  index;          // a number 0 to 49
  uint8_t      data [MAX];     // the raw data
} cmd_t;

Step one would then be that upon receiving a command, you translate it to the corresponding index.

// ...receive data and place it in cmdBuffer[10], then:
cmd_t cmd;
cmd_create(&cmd, cmdBuffer[0], &cmdBuffer[1]);

...

void cmd_create (cmd_t* cmd, uint8_t command, uint8_t* data)
{
   cmd->command = command;
   memcpy(cmd->data, data, MAX);

   switch(command)
   {
     case THIS: cmd->index = COMMAND_THIS; break;
     case THAT: cmd->index = COMMAND_THAT; break;
     ... 
   }
}

Once you have an index 0 to N means that you can implement lookup tables. Each such lookup table can be an array of function pointers, which determine the specific interpretation of the data. For example:

typedef void (*interpreter_func_t)(uint8_t* data);

const interpreter_func_t interpret [COMMANDS_N] =
{
  &interpret_this_command,
  &interpret_that_command,
  ...
};

Use:

interpret[cmd->index] (cmd->data);

Then you can make similar lookup tables for different tasks.

   initialize [cmd->index] (cmd->data);
   interpret  [cmd->index] (cmd->data);
   repackage  [cmd->index] (cmd->data);
   do_stuff   [cmd->index] (cmd->data);
   ...

Use different lookup tables for different architectures. Things like endianess can be handled inside the interpreter functions. And you can of course change the function prototypes, maybe you need to return something or pass more parameters etc.

Note that the above example is most suitable when all commands result in the same kind of actions. If you need to do entirely different things depending on command, other approaches are more suitable.

score 2 · Answer 2 · answered Apr 25 '14 at 06:43

IMHO it is a dirty hack. The code may break when ported to a system with different alignment requirements, different variable sizes, different type representations (e.g. big endian / little endian). Or even on the same system but different version of compiler / system headers / whatever.

I don't think it violates strict aliasing, so long as the relevant bytes form a valid representation.

I would just write code to read the data in a well-defined manner, e.g.

bool extract_A(PayloadA_t *out, uint8_t const *in)
{
    out->foo = in[0];
    out->bar = read_uint32(in + 1, 4);
    // ...
}

This may run slightly slower than the "hack" version, it depends on your requirements whether you prefer maintenance headaches, or those extra microseconds.

The problem with the endianess can be resolved by using macros, which expand to either just the value, or if necessary, to a short code which arranges the bytes. Padding can avoided with nearly all compilers. But I'm not sure if then the structure members can be accessed on all platforms. However, an uint8_t array does not contain padding and can be accessed... what about that? — lugge86, Apr 25 '14 at 08:38

score 1 · Answer 3 · edited May 23 '17 at 12:11

Answering your questions in the same order:

This approach is quite common, but it's still called a dirty hack by any book I know that mentions this technique. You spelled the reasons out yourself: in essence it's highly unportable or requires a lot of preprocessor magic to make it portable.
strict aliasing rule: see the top voted answer for What is the strict aliasing rule?
The only alternative solution I know is to explicitly code the deserialization as you mentioned yourself. This can actually be made very readable like this:

uint8_t *p = buffer; struct s; s.field1 = read_u32(&p); s.field2 = read_u16(&p);

I. E. I would make the read functions move the pointer forward by the number of deserialized bytes.

As said above, you can use the preprocessor to handle different endianness and struct packing.

score 1 · Answer 4 · answered Apr 25 '14 at 14:19

It's a dirty hack. The biggest problem I see with this solution is memory alignment rather than endianness or struct packing.

The memory alignment issue is this. Some microcontrollers such as ARM require that multi-byte variables be aligned with certain memory offsets. That is, 2-byte half-words must be aligned on even memory addresses. And 4-byte words must be aligned on memory addresses that are multiples of 4. These alignment rules are not enforced by your serial protocol. So if you simply cast the serial data buffer into a packed structure then the individual structure members may not have the proper alignment. Then when your code tries to access a misaligned member it will result in an alignment fault or undefined behavior. (This is why the compiler creates an un-packed structure by default.)

Regarding endianness, it sounds like your proposing to correct the byte-order when your code accesses the member in the packed structure. If your code accesses the packed structure member multiple times then it will have to correct the endianness every time. It would be more efficient to just correct the endianness once, when the data is first received from the serial port. And this is another reason not to simply cast the data buffer into a packed structure.

When you receive the command, you should parse out each field individually into an unpacked structure where each member is properly aligned and has the proper endianness. Then your microcontroller code can access each member most efficiently. This solution is also more portable if done correctly.

score 0 · Answer 5 · edited Apr 25 '14 at 08:19

0

Yes this is the problem of memory alignment.

Which controller you are using ?

Just declare the structure along with following syntax,

__attribute__(packed)

may be it will solve your problem.

Or you can try to access the variable as reference by address instead of reference by value.

edited Apr 25 '14 at 08:19

Lundin

195,001
40
254
396

answered Apr 25 '14 at 07:09

Parthiv Shah

350
3
17

This is a GCC extension and not portable. Removing padding is perhaps not a solution anyhow, since some platforms require it and the struct cannot function without padding. – Lundin Apr 25 '14 at 08:19
Removing padding should be common along C compilers nowadays, however, the syntax may differ. Green Hills for example uses pragmas to do this. – lugge86 Apr 25 '14 at 08:36
which compiler are you using ? because i have added this with my PIC32 based project. where i have used c32 compiler and MPLAB IDE – Parthiv Shah Apr 25 '14 at 11:07

Use casts to access a byte-array like a structure?

5 Answers5