C: Elegant way to decode array of char (deserialize) to struct?

Question

So I'm currently designing a simple transmission protocol for a ring network implemented in UART.

To transmit, I am converting the data from a struct to a char stream, started by < and terminated by > - I know of the possibility of any 0x3c or 0x3e value being mistaken for a < or > respectively, i am working on a solution to escape that. That's not part of my question.

So, the structure of it is something like <UINT32UINT32UINT8UINT8UINT16char[0...n]>, those types representing: <destinationId senderId TimeToLive Cmdtype CmdId Payloadlength Payload>. This always stays the same, so I can assume it without any delimiters between the values. This works, and I can theoretically also decode this. To easily access the bytes, I implemented the struct with unions:

typedef struct{
    union{
        uint32_t val;
        char bytes[sizeof(uint32_t)];
    } recipientId;
    union{
        uint32_t val;
        char bytes[sizeof(uint32_t)];
    } senderId;
    union{
        uint8_t val;
        char bytes[sizeof(uint8_t)];
    } timeToLive;
    union{
        uint8_t val;
        char bytes[sizeof(uint8_t)];
    } cmdType;
    union{
        uint8_t val;
        char bytes[sizeof(uint8_t)];
    } cmdId;
    union{
        uint16_t val;
        char bytes[sizeof(uint16_t)];
        } payloadLength;
    char *payload;
    char *commandRaw;
} aurPacket_t;

Once a packet exists, I decode it with something akin to this:

void decode_command(aurPacket_t packet){
    if((packet.commandRaw[0] != '<' ) || (packet.commandRaw[strlen(packet.commandRaw) - 1]  != '>') ){
        printf("\n\nINVALID COMMAND\n\n");
    }
    else{
        aurPacket_t newpacket;
        // EITHER:
//      for (int i = 0; i < strlen(newpacket.recipientId.bytes); i++){
//          newpacket.recipientId.bytes[i] = (char)*(packet.commandRaw + 1 + i);
//      }

        // OR:
        strncpy(newpacket.recipientId.bytes, (packet.commandRaw + 1), sizeof(newpacket.recipientId.bytes));
    }
}

commandRaw contains the char stream that would be received in a message.

Using something like this I'd be able to do it, but I'd need to iterate it one by one since not all values are the same datatype - copying the string out to my payload in the end. Is there a way to make this more elegant than iterating through every single variable, to somehow iterate by using my struct as a guide for the iterator?

I was made aware of memcpy, but since I want to keep the protocol platform-independent, I'd prefer not to use it unless there is an elegant way to. Or is there a way to use it elegantly? Is my method of adding the variables even different from just using memcpy? Thinking about it, it doesn't seem like it would be, considering the ordering of the vars inside my struct. If I made a string containing <, memcpy-appended everything up to the payload, then memcpy-appended the payload, then appended a >, would there be any actual difference in the data? If not, I could just use that process in reverse to decode a message.

I'm currently encoding the message by using this function:

#define RAW_PACKET_LEN 1024
void parse_command(aurPacket_t packet){
    snprintf(packet.commandRaw, RAW_PACKET_LEN, "<%.4s%.4s%.1s%.1s%.1s%.02s%.*s>",
                                    packet.recipientId.bytes,
                                    packet.senderId.bytes,
                                    packet.timeToLive.bytes,
                                    packet.cmdType.bytes,
                                    packet.cmdId.bytes,
                                    packet.payloadLength.bytes,
                                    packet.payloadLength.val, packet.payload
                                    );
} // memory for commandRaw allocated outside of function

which has the problem of not actually writing 0x00-bytes to the stream, but that's not part of the question - I'm looking for an answer to that at the moment (of course, if you know an easy fix, feel free to let me know :) )

The system is an ESP32 programmed using ESP-IDF.

"but since I want to keep the protocol platform-independent, I'd prefer not to use it" As in, you want the struct to work portably, regardless of padding? — Lundin, May 07 '20 at 07:59

score 1 · Answer 1 · answered May 12 '20 at 20:03

Some tips here:

Your packet should not contain a pointer, it would point to an address that only the sender would know about. You want to actually copy the array into the command (see 4 below).
“char bytes[sizeof(uint8_t)]” is literally preprocessed at compile time into “char bytes[1]”, so you might as well just have “char byte”, which is singular.
If you are just using a union for a uint8_t and a char, don’t bother just cast it before you use it: (eg; printf(“%c”,(char)val);
You will save yourself a ton of headache if you can just agree on fixed packet size. It looks like the only thing that will be dynamic is the payload, and the commandraw. Pick your worst case and go with that. I know that you are using serial, but unless you are doing something other than 8N1 it won’t have checks anyway, so you don’t want anything too long. I suggest you pick a total length under 1472 incase you move to UDP/TCP one day.
Assuming you can do these things, copying the data out is a piece of cake. You make your command as a struct. Then you make a union which contains the command as the first member and an array the size of the command as the second member. For example I would used uint8s (you could use chars).

union CommandMsg{ //you could also define this ahead of time..inlining b/c lazy. struct Command{ uint32_t recipientId; uint32_t senderId; uint8_t val; char timeToLive; //why char? uint8_t cmdType; uint8_t cmdId; uint16_t val; uint16_t payloadLength; //maybe just strlen later if term'd? char payload[256]; char commandRaw[256]; } asCommand; uint8_t asByteArray[sizeof(Command)]; } commandMsg;
Safety first, before you stuff your command struct memcpy zeros into the whole thing. The zeros will act like terminators for any strings you memcpy later.
Make sure to __pack your struct
When you copy it back out cast your arrays to void pointer memcpy((void *)localCopy, (const void *)incoming, sizeof(CommandMsg));

score 0 · Answer 2 · answered May 07 '20 at 08:49

Not really sure what counts as "elegant", but here are some diverse solutions.

The old-fashioned C way to deal with something like this would be to create a look-up table. Given that you only have a static "singleton" buffer static aurPacket_t aur_packet;, then you can create a pointer look-up table at compile-time:

#define AUR_PACKET_MEMBERS_N 6

static char* const AUR_PACKET_LOOKUP[AUR_PACKET_MEMBERS_N] =
{
  aur_packet.recipientId.bytes,
  aur_packet.senderId.bytes,
  aur_packet.timeToLive.bytes,
  aur_packet.cmdType.bytes,
  aur_packet.cmdId.bytes,
  aur_packet.payloadLength.bytes,
};

And now you can iterate over each part of the struct by accessing AUR_PACKET_LOOKUP[i] and get a char* to its bytes member.

A more radical (and not necessarily readable) approach would be "X macros", creating something like

#define AUR_PACKET_LIST    \
  X(UINT32, recipientId)   \
  X(UINT32, senderId)      \
  X(UINT8,  timeToLive)    \
  X(UINT8,  cmdType)       \
  X(UINT8,  cmdId)         \
  X(UINT16, payloadLength) \

You can now generate the string at compile-time, without involving sprintf functions:

strcpy(packet.commandRaw, 
  #define X(type, member) #type
    AUR_PACKET_LIST
  #undef X
);

Which expands to strcpy(packet.commandRaw, "UINT32" "UINT32" ... and the pre-processor concatenates the string literals from there.

And finally, it's perfectly possible to go completely macro ape and use X macros to define the type itself too, which I don't really recommend unless you have severe requirements to avoid code repetition:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef uint8_t  UINT8;
typedef uint16_t UINT16;
typedef uint32_t UINT32;

#define AUR_PACKET_LIST(X) \
/*  type    name        */ \
  X(UINT32, recipientId)   \
  X(UINT32, senderId)      \
  X(UINT8,  timeToLive)    \
  X(UINT8,  cmdType)       \
  X(UINT8,  cmdId)         \
  X(UINT16, payloadLength) \

#define AUR_PACKET_DECL_MEMBER(type, name) \
  union {                                  \
    type val;                              \
    char bytes[sizeof(type)];              \
  } name;

typedef struct{
    AUR_PACKET_LIST(AUR_PACKET_DECL_MEMBER)
    char *payload;
    char *commandRaw;
} aurPacket_t;

int main(void) 
{
  aurPacket_t packet = {.commandRaw=malloc(256)};
  packet.commandRaw[0]='\0';

  #define AUR_PACKET_COMMANDRAW(type, name) #type
  strcpy(packet.commandRaw, AUR_PACKET_LIST(AUR_PACKET_COMMANDRAW));

  puts(packet.commandRaw);
  return 0;
}

Alex · Answer 3 · 2020-05-08T07:13:48.120

0

Your struct looks redundant. It's better to use something like this:

typedef struct {
    uint8_t head;
    uint32_t recipientId;
    uint32_t senderId;
    uint8_t timeToLive;
    uint8_t cmdType;
    uint8_t cmdId;
    uint16_t payloadLength;
    uint8_t *payload;
    uint8_t tail;
} aurPacket_t;

Anyway you should deal with buffer to have memory for payload. It can be global array with fixed size, it's better for embedded software. memcpy is good solution when you controls all the sizes you operates with, because it's usually optimized for target hardware. You should use only fixed size types, like uint8_t and uint32_ for your data members in your packet to make it cross-platform and remember about big- or little-endian on your machines, when you storing data to packet.

Also it's good practice to make check sum a part of your packet to validate data.

edited May 08 '20 at 07:13

answered May 07 '20 at 09:17

Alex

61
4

`aurPacket_t *packet = &buffer[i];` is not only wrong and causes repeated misaligned reads, it also invokes undefined behavior for strict aliasing violations. – Lundin May 07 '20 at 14:43
Can you please explain, what you mean? If you have fixed size variables in structure, how can it be possible to get wrong alignment? – Alex May 08 '20 at 05:41
1

Because each time in the loop you tell the program to increase the base address of the struct by 1 byte. `aurPacket_t *packet = &buffer[i];` is not even valid C because it's an invalid pointer conversion. This whole code is just plain wrong in multiple ways. – Lundin May 08 '20 at 06:30
Yes, I added type conversion ``aurPacket_t *packet = (aurPacket_t*)&buffer[i];``. It's better to have different types and for header, tail and payload to have more reliable way for conversions. And it's too much code for such small discussion. – Alex May 08 '20 at 06:58
You don't get what I'm saying. It's still reading the struct misaligned every lap in the loop! And there is [strict aliasing](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule). And you shouldn't modify the loop iterator from inside a for loop ever, that's how hopeless spaghetti is made. Why do you for example increase `i` by 2 each lap? This code can't be salvaged. – Lundin May 08 '20 at 07:02
1

Ha! I found this, I thought I doing while loop. And increased ``i`` by two. Let's remove it. Maybe for better times when I can pay more attention for it. – Alex May 08 '20 at 07:20

C: Elegant way to decode array of char (deserialize) to struct?

3 Answers3