How to code ASCII Text Based protocol over RS-232 in C

Question

I have to implement a relatively simple communication protocol on top of RS-232. It's an ASCII based text protocol with a couple of frame types.

Each frame looks something like this:

 * ___________________________________
 * |     |         |         |        |
 * | SOH |   Data  | CRC-16  | EOT    |
 * |_____|_________|_________|________|
 *   1B    nBytes      2B       1B

Start Of Header (1 Byte)
Data (n-Bytes)
CRC-16 (2 Bytes)
EOT (End Of Transmission)

Each data-field needs to be separated by semicolon ";": for example, for HEADER type data (contains code,ver,time,date,src,id1,id2 values):

{code};{ver};{time};{date};{src};{id1};{id2}

what is the most elegant way of implementing this in C is my question?

I have tried defining multiple structs for each type of frame, for example:


typedef struct {
    uint8_t soh;
    char code;
    char ver;
    Time_t time;
    Date_t date;
    char src; // Unsigned char
    char id1[20]; // STRING_20
    char id2[20]; // STRING_20
    char crlf;
    uint16_t crc;
    uint8_t eot;
} stdHeader_t;

I have declared a global buffer:

uint8_t DATA_BUFF[BUFF_SIZE];

I then have a function sendHeader() in which I want to use RS-232 send function to send everything byte by byte by casting the dataBuffer to header struct and filling out the struct:

static enum_status sendHeader(handle_t *handle)
{
    uint16_t len;
    enum_RETURN_VALUE rs232_err = OK;
    enum_status err = STATUS_OK;

    stdHeader_t *header = (stdHeader_t *)DATA_BUFF;

    memset(DATA_BUFF, 0, size);

    header ->soh= SOH,
    header ->code= HEADER,
    header ->ver= 10, // TODO
    header ->time= handle->time,
    header ->date= handle->date,
    header ->src= handle->config->source,
    memset(header ->id1,handle->config->id1, strlen(handle->config->id1));
    memset(header ->id2,handle->config->id2, strlen(handle->config->id1));
    header ->crlf = '\r\n',
    header ->crc  = calcCRC();
    header ->eot = EOT;

    len = sizeof(stdHeader_t );

    do
    {
        for (uint16_t i = 0; i < len; i++) 
        {
            rs232_err= rs232_tx_send(DATA_BUFF[i], 1); // Send one byte
            if (rs232_err!= OK)
            {
                err = STATUS_ERR;
                break;
            }
        }
        // Break do-while loop if there is an error
        if (err == STATUS_ERR)
        {
            break;
        }
    } while (conditions); 


    return err;
}

My problem is that I do not know how to approach the problem of handling ascii text based protocol, the above principle would work very well for byte based protocols.

Also, I do not know how to implement semicolon ";" seperation of data in the above snippet, as everything is sent byte by byte, I would need aditional logic to know when it is needed to send ";" and with current implementation, that would not look very good.

For fields id1 and id2, I am receiveing string values as a part of handle->config, they can be of any lenght, but max is 20. Because of that, with current implementation, I would be sending more than needed in case actual lenght is less than 20, but I cannot use pointers to char inside the struct, because in that case, only the pointer value would get sent.

So to sumarize, the main question is:

How to implement the above described text based protocol for rs-232 in a nice and proper way?

No idea why this was closed as too broad. There are several questions but they are related. I was almost done writing an answer... voting to re-open. In general, can people who have no clue about embedded systems kindly step away from [tag:embedded]? — Lundin, Nov 10 '22 at 15:08
You need to write functions to _serialize_ the C data structures as ASCII string in given format, and functions to _deserialize_ strings into C fata structures (this involves recognizing which structure should be used, ie data type). — hyde, Nov 10 '22 at 15:09
By what definition could that be considered a "text based" protocol? It is clearly binary. NMEA0183 is an example of a text protocol - it is line based, uses printable ASCII delimiters and ASCII hexadecimal checksums. Yours has none of those attributes. — Clifford, Nov 13 '22 at 14:15

chux - Reinstate Monica · Answer 1 · 2022-11-10T18:13:41.980

3

what is the most elegant way of implementing this (ASCII Text Based protocol) in C is my question?

Since this is ASCII, avoid endian issues of trying to map a multi-byte integer. Simply send an integer (including char) as decimal text. Likewise for floating point, use exponential notation and sufficient precision. E.g. sprintf(buf, "%.*e", DBL_DECIMAL_DIG-1, some_double);. Allow "%a" notation.
Do not use the same code for SOH and EOT. Different values reduce receiver confusion.
Send date and time using ISO 8601 as your guide. E.g. "2022-11-10", "23:38:42".
Send string with a leading/trailing ". Escape non-printable ASCII characters, and ", \, ;. Example for 10 long string 123\\;\"\xFF456 --> "123\\\;\"\xFF456".
Error check, like crazy, the received data. Reject packets of data for all sorts of reasons: field count wrong, string too long, value outside field range, bad CRC, timeout, any non-ASCII character received.
Use ASCII hex characters for CRC: 4 hex characters instead of 2 bytes.
Consider a CRC 32 or 64.
Any out-of-band input, (bytes before receiving a SOF) are silently dropped. This nicely allows an optional LF after each command.
Consider the only characters between SOH/EOT should be printable ASCII: 32-126. Escape others as needed.
Since "it's an ASCII based text protocol with a couple of frame types.", I'd expect a type field.

See What type of framing to use in serial communication for more ideas.

edited Nov 10 '22 at 18:13

answered Nov 10 '22 at 17:52

chux - Reinstate Monica

143,097
13
135
256

Thanks, this is usefull information. 1) SOH and EOT are different 2) Date format is specified by the protocl I am trying to implement 3) I need to send multiple string in the exact way they appear, but seperated with semicolon, for example: "code-text";"source-text";"version-text"; etc.... It looks simple, but I am completely lost on how to implement this behaviour elegantly with any string length – wirelabs Nov 10 '22 at 18:51
@wirelabs I saw the `1B` in `* | SOH | Data | CRC-16 | EOT ... | * 1B nBytes 2B 1B` and thought SOH/EOT were both the escape character 0x1B. I now see this refers to 1 byte. – chux - Reinstate Monica Nov 10 '22 at 18:55
@wirelabs "I need to send multiple string in the exact way they appear, but seperated with semicolon" You do not _need_ to send in the exact way. You can encode and then send in many different ways. What is important is for the receiver to decode and re-create exactly the same string. This same issue applies to date/time: the format of the data frame is not required to be the same as `Date_t / Time_t`. – chux - Reinstate Monica Nov 10 '22 at 19:00
the problem is that I don't have the control of the reciever side. I only know the behaviour of protocol, and need to implement it in a exact specified way – wirelabs Nov 10 '22 at 19:01
@wirelabs " I don't have the control of the reciever side." --> without clearly posting the receiver side code/definition, we are coding in the dark. – chux - Reinstate Monica Nov 10 '22 at 19:09
I think it's pretty clearly defined in the question post, what other details would you require? – wirelabs Nov 10 '22 at 19:11
@wirelabs Are the sender/recover always the same endian (integer byte order)? What happens when SOH or EOT occur in `ID1[], ID2{}`? How does the receiver handle timeouts, CRC errors, short packets (EOT occurred sooner than expected like in the CRC). How does the receiver determine a SOF is not a CRC byte? Does `Date_t, Time_t` potentially contain padding? How does the receiver handle an out of range date/time? What happens when `code`, `ver` or other data bytes have the value of `';'`? – chux - Reinstate Monica Nov 10 '22 at 19:23
1

@wirelabs As is, the apparent receive protocol is poor and weakly defined. Just populate `DATA_BUFF` one member at a time to void padding and use network endian to cope with variant byte order and hope your receiver never gets out of sync. – chux - Reinstate Monica Nov 10 '22 at 19:24
"Use ASCII hex characters for CRC: 4 hex characters instead of 2 bytes." Uh, no don't do that. CRC polynomials are picked to resemble unlikely error sequences. If you convert them to ASCII you lose that advantage. As for how big part the polynomial actually plays, it seems fairly subjective and depending on what they were originally designed for. CRC-16-CCITT seems to have the best reputation regarding this, at least in the context of UART-based buses. – Lundin Nov 10 '22 at 20:48
@Lundin If `EOT` or `SOF` are possible bytes that may occur in other places like the CRC, proper beginning of the commend or its end may be misinterpreted. I do not agree as ASCII any significant advantage is lost. IMO, for an "ASCII Text Based protocol", non-ASCII should not appear. Based on other OP comments, the description as _ASCII Text Based protocol_ is amiss and has led to an unclear problem . – chux - Reinstate Monica Nov 10 '22 at 21:08
CRC is calculated and then converted to string as per protocol requirments – wirelabs Nov 11 '22 at 06:26
@wirelabs The post has CRC as _2B_. I assume that is 2 bytes which nicely fits a 16-bit CRC. You say "CRC is calculated and then converted to string". So is that a _string_ like C defines as characters with a final _null character_, so maybe 5 characters like `'A'`, `'B'`, `'4'`, `'2'`, `'\0'` or what? – chux - Reinstate Monica Nov 11 '22 at 07:37
@chux-ReinstateMonica "I do not agree as ASCII any significant advantage is lost" Well, you are wrong. [A Painless guide to CRC](https://ceng2.ktu.edu.tr/~cevhers/ders_materyal/bil311_bilgisayar_mimarisi/supplementary_docs/crc_algorithms.pdf), see chapter 7. When you convert the FCS to ASCII, you throw all of that out the window and you might as well use some naive checksum like counting ones or summing bytes. The whole point of CRC is to avoid something looking like common error patterns. – Lundin Nov 11 '22 at 07:50
Null termination should be dropped from all data, including crc – wirelabs Nov 11 '22 at 12:18
@wirelabs If null termination should be dropped from all data, then that does not meet the C definition of _string_ which always, by definition, contains a trailing 0. [CRC is calculated and then converted to string](https://stackoverflow.com/questions/74391012/how-to-code-ascii-text-based-protocol-over-rs-232-in-c/74393363?noredirect=1#comment131339814_74393363) is amiss. Say the 16-bit CRC was 0x0123. How many and what bytes would make up the CRC-16 of your protocol? – chux - Reinstate Monica Nov 11 '22 at 16:09

score 2 · Answer 2 · answered Nov 10 '22 at 15:18

First of all, structs are really not good for representing data protocols. The struct in your example will be filled to the brim with padding bytes everywhere, so it is not a proper nor portable representation of the protocol. In particular, forget all about casting a struct to/from a raw uint8_t array - that's problematic for even more reasons: the first address alignment and pointer aliasing.

In case you insist on using a struct, you must write serialization/deserialization routines that manually copy to/from each member into the raw uint8_t buffer, which is the one that must be used for the actual transmission.

(De)serialization routines might not be such a bad idea anyway, because of another issue not addressed by your post: network endianess. RS-232 protocols are by tradition almost always Big Endian, but don't count on it - endianess must be documented explicitly.

My problem is that I do not know how to approach the problem of handling ascii text based protocol, the above principle would work very well for byte based protocols.

That is a minor problem compared to the above. Often it is acceptable to have a mix of raw data (essentially everything but the data payload) and ASCII text. If you want a pure ASCII protocol you could consider something like "AT commands", but they don't have much in the way of error handling. You really should have a CRC16 as well as sync bytes. Hint: preferably pick the first sync byte as something that don't match 7 bit ASCII. That is something with MSB set. 0xAA is popular.

Once you've sorted out data serialization, endianess and protocol structure, you can start to worry about details such as string handling in the payload part.

And finally, RS232 is dinosaur stuff. There's not many reasons why one shouldn't use RS422/RS485. The last argument for using RS232, "computers come with RS232 COM ports", went obsolete some 15-20 years back.

"RS-232 protocols are by tradition almost always Big Endian" --> BE is certainly common - and perhaps most common, yet _almost always_ overstates. Many protocols use LE. Example: serialized [ISO 15693](https://en.wikipedia.org/wiki/ISO/IEC_15693#Communication_to_the_card). Note: RS-232 is little endian at the bit level. LSbit goes first. — chux - Reinstate Monica, Nov 10 '22 at 18:18
@chux-ReinstateMonica Big Endian makes it possible to implement CRC in hardware using XOR gates, although ideally it would also be MSB first for that. Well anyway, the important part is that one should never assume one endianess or the other. — Lundin, Nov 10 '22 at 20:44

score 0 · Answer 3 · answered Nov 10 '22 at 15:43

0

One thing your struct implementation is missing is packing. For efficiency reasons, depending on which processor your code is running on, the compiler will add padding to the structure to align on certain byte boundaries. Normally this doesn't effect you code that much, but if you are sending this data across a serial stream where every byte matters, then you will be sending random zeros across as well.

This article explains padding well, and how to pack your structures for use cases like yours

Structure Padding

answered Nov 10 '22 at 15:43

Bojan Gavrilovic

26
2

rs-232 is there because this will be running on some equipment that still uses it. rs-232 is old, reliable data transfer protocl that gets the job done, hence it is still widely used in different industries. It would be really helpful if someone with a good grasp of protocols could post some general snippet ideas on how to approach this problem. So far all the answer, although helpful, have not addressed the core problem I have, which is - I don't know how to efficiently, on protocol level, send the data as char/string and insert semicolon after each data value. – wirelabs Nov 10 '22 at 18:23
The problem is that id1 string could be e.g. : "ThisIsID1" or "ID1", so it is of no fixed lenght, but max length is 20. how to recognize when to insert semicolon after each such record? How would you approach coding this protocol? – wirelabs Nov 10 '22 at 18:24

How to code ASCII Text Based protocol over RS-232 in C

3 Answers3