1

I ran into a situation where I am generating random data packets to be transmitted over the uart of a Raspberry PI 3b+ with the Raspbian OS. I found that after 240 or so calls to rand() that an extra character is being generated in my data packet that could possible cause a 1 byte buffer overrun. So if the data length of my packet is 48 bytes, 49 bytes will be seen on the wire by the receiving devices but not by the transmitting device. I have verified that 1 extra byte is being sent on the wire via memory analysis on my receiving device.

Has anyone run into a situation where if the write() syscall is interrupted while writing to the /dev/serial0 device socket, the raspberry pi will send an incorrect byte?

My best guess is that when the pseudorandom number pool in Linux is being refreshed via the kernel during that system call. When I transmit an array of data of 48 bytes of fixed data, I don't get that spurious byte transmission, but I get that spurious byte if I auto generate .

The program I wrote has 2 pthreads. One for generating the packets and another for transmitting them. The transmitting thread at t=0 sleeps automatically because there is no data to send. The data generating thread will wake up the transmitting thread after 4 packets have been produced. I can regularly recreate this issue even if I sleep my transmitting thread for 1s after every transmission.

I tried to cut down on the unused code, but its still over 300 lines. It was a regular issue now its playing cat & mouse with me.

To Build:
gcc -pthread -g -o exp exp.c
#include <unistd.h>
#include <sys/types.h>
#include <sys/syscall.h>
#include <fcntl.h>
#include <termios.h>
#include <errno.h>
#include <pthread.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <time.h>

#include <assert.h>


#define SPORT "//dev//serial0"
#define NETPAYLOADSZ 256

#define DATPOOLSZ 16


typedef union __attribute__((packed)) PACKET
{
    struct
    {
        uint8_t SID;
        uint8_t DID;
        uint8_t LEN;
        uint8_t SPT;
        uint8_t DPT;
        uint8_t TID;
        uint8_t DAT[249];//very last byte is the checksum
    };
    uint8_t packet[NETPAYLOADSZ];
}uPacket;

uPacket *DataPool[DATPOOLSZ];
enum THRDSTAT { eNOTRUNNING, eRUNNING, eSLEEPING, eSTOPPED };

struct SerRes
{
        int sigint;
        int serialFD;
        int datInPoolIdx; //writen by GenDat, read by SerTx
        int datOutPoolIdx;//written by SerTx
        enum THRDSTAT statusGen;
        enum THRDSTAT statusTX;
        uint8_t dataAvail;

        pthread_mutex_t sermut;
        pthread_mutex_t genSleepmux;
        pthread_mutex_t serTxSleepmux;

        pthread_cond_t genDatSleep;
        pthread_cond_t serTxSleep;
};

struct SerRes serres;
uPacket stDatPool[DATPOOLSZ];

void SIGHandler(int num)
{
    struct termios options;

    if(serres.sigint == 1)
    {
        close(serres.serialFD);
        serres.statusGen = eSTOPPED;
        serres.statusTX = eSTOPPED;
        exit(-1);
        return;
    }
    tcgetattr(serres.serialFD, &options);
    options.c_cc[VTIME] = 1; //timeout of 100ms
    options.c_cc[VMIN]  = 0; //1 receive atleast 1 character

    tcsetattr(serres.serialFD, TCSANOW, &options);

    serres.sigint = 1;
}

void Init(char *serpath)
{
    struct termios options;

    memset(&serres, 0, sizeof(struct SerRes));
    memset(&DataPool, 0, sizeof(uPacket*)*DATPOOLSZ);

    pthread_mutex_init(&serres.sermut, NULL);
    pthread_mutex_init(&serres.genSleepmux, NULL);

    pthread_cond_init(&serres.genDatSleep, NULL);
    pthread_cond_init(&serres.serTxSleep, NULL);


    serres.serialFD = open(serpath, O_RDWR | O_NOCTTY | O_NDELAY);
    if(serres.serialFD < 0)
    {
        printf("\nError no is: %d", errno);
        printf("\nError description: %s\n", strerror(errno));
        exit(-1);
    }

    cfmakeraw(&options);
    tcgetattr(serres.serialFD, &options);
    options.c_cflag = CS8 | CLOCAL | CREAD;
    options.c_lflag &= ~(ICANON | ECHO | ECHOE | ISIG );
    options.c_cc[VTIME] = 10; //timeout of 100ms
    options.c_cc[VMIN]  = 1; //1 receive atleast 1 character

    cfsetspeed(&options, B115200); //set both input and output speed

    tcsetattr(serres.serialFD, TCSANOW, &options);
}

void Deinit(void)
{
    close(serres.serialFD);
    pthread_mutex_destroy(&serres.sermut);
    pthread_mutex_destroy(&serres.genSleepmux);
    pthread_cond_destroy(&serres.genDatSleep);
}


void *DatGen(void *arg)
{
    int randDev = 0;
    uint8_t idx;
    uint8_t cksum = 0;

    uint8_t buffer[48] = {0x91, 0x39, 0x97, 0xb9, 0x50, 0x9b, 0x7a, 0x33, 0xe3, 0x1, 0xfa, 0x82, 0x61, 0xbd, 0xec, 0x1,\
                      0x8,  0x5,  0xd,  0x9c, 0x27, 0xcc, 0x4e, 0x8e, 0x63, 0x48, 0x37, 0x3b, 0x66, 0xde, 0x48, 0x77,\
                      0x98, 0xdf, 0x31, 0x68, 0xfa, 0x2b, 0x9b, 0x5f, 0x2c, 0x96, 0xe1, 0xd,  0x54, 0x4f, 0xf,  0x5c};

    pid_t tid = (pid_t)syscall(__NR_gettid);


    printf("\nDatGen - %d: Starting: \r\n", tid);
    serres.statusGen = eRUNNING;
    srand(0x089FFEE4);

    while(serres.sigint == 0 && serres.statusGen != eSTOPPED)
    {

        //Sleep Condition
        pthread_mutex_lock(&serres.genSleepmux);
        if((serres.dataAvail == DATPOOLSZ))
        {
            printf("DatGen - %d: No more Data to Generate: Sleeping - %d\r\n", tid, serres.dataAvail );
            serres.statusGen = eSLEEPING;
            while(serres.dataAvail == DATPOOLSZ) //Gaurd against spurious wake up events
            {
                    pthread_cond_wait(&serres.genDatSleep, &serres.genSleepmux);
            }
            printf("Datgen - %d: ******** Wokeup, running\r\n\n", tid);

            if(serres.statusTX == eSTOPPED)
                    break;

            serres.statusGen = eRUNNING;
        }
        pthread_mutex_unlock(&serres.genSleepmux);

        //Time to wake up the SerTX thread? Maybe?
        if(serres.dataAvail > 3 && serres.statusTX == eSLEEPING)
        {
                    pthread_mutex_lock(&serres.serTxSleepmux);
                    pthread_cond_signal(&serres.serTxSleep);
                    pthread_mutex_unlock(&serres.serTxSleepmux);
        }

        //Generate the Packets.
        idx = serres.datInPoolIdx;

        serres.datInPoolIdx = (serres.datInPoolIdx + 1) & (DATPOOLSZ - 1);
        assert(serres.datInPoolIdx < DATPOOLSZ);
        stDatPool[idx].SID = 0x55;
        stDatPool[idx].DID = 0xAA;
        stDatPool[idx].LEN = 0x30; //(rand() % 100);
        stDatPool[idx].SPT = 0xEE;
        stDatPool[idx].DPT = 0x22;
        stDatPool[idx].TID = 0x77;


        for(int i = 0; i < stDatPool[idx].LEN ; i++)
        {
                stDatPool[idx].DAT[i] = rand() % 100; //Only Write
                cksum += stDatPool[idx].DAT[i];
        }

        stDatPool[idx].LEN += 7;

        cksum += stDatPool[idx].SID + stDatPool[idx].DID + stDatPool[idx].SPT + stDatPool[idx].LEN;
        cksum += stDatPool[idx].DPT + stDatPool[idx].TID;

        stDatPool[idx].packet[stDatPool[idx].LEN-1] = 0xFF;

        /********* Critical Shared Section *************/
        pthread_mutex_lock(&serres.sermut);

                serres.dataAvail++; //Touched by both threads...
                assert(serres.dataAvail < DATPOOLSZ+1);
                if(serres.dataAvail == DATPOOLSZ)
                {
                        printf("Max Dat Reached\r\n");
                }

        pthread_mutex_unlock(&serres.sermut);
        /*************************************************/

    }
    serres.statusGen = eSTOPPED;
    pthread_exit(&serres.sigint);
}

#define LOOPCOUNT 8
void *SerTx(void *arg)
{
        pid_t tid = (pid_t)syscall(__NR_gettid);
        uint8_t idx = 0;
        uint16_t randel = 0;
        uint8_t count = 0;
        uint8_t bytesSent = 0;

        serres.statusTX = eRUNNING;

        while(serres.sigint == 0 && serres.statusTX != eSTOPPED && count < LOOPCOUNT)
        {

            //Sleep Condition
            pthread_mutex_lock(&serres.genSleepmux);
            if(serres.dataAvail < 1)
            {

                pthread_cond_signal(&serres.genDatSleep);
                printf("SerTx - %d: All Data Consumed\r\n", tid);

                serres.statusTX = eSLEEPING;
                while(serres.dataAvail < 1) //Gaurd against spurious wakeup events.
                {
                        pthread_cond_wait(&serres.serTxSleep, &serres.genSleepmux);
                }

                serres.statusTX = eRUNNING;
                printf("SerTx - %d: ^^^^^^^ Woke up Running\r\n", tid);
            }
            pthread_mutex_unlock(&serres.genSleepmux);

            //Output
            idx = serres.datOutPoolIdx;
            serres.datOutPoolIdx = (serres.datOutPoolIdx + 1) & (DATPOOLSZ - 1);

            bytesSent = write(serres.serialFD, &stDatPool[idx].packet[0], stDatPool[idx].LEN); //only Read

            if(stDatPool[idx].LEN != bytesSent) //did we not send all the bytes?
            {
                printf("Pkt Len: %x\nBytesSent: %x\r\n", stDatPool[idx].LEN, bytesSent);
                assert(0);
            }

            printf("Consume: %x\r\n", stDatPool[idx].LEN);

            /********* Critical Shared Section *************/
            pthread_mutex_lock(&serres.sermut);

                serres.dataAvail--; //shared write
                assert(serres.dataAvail < DATPOOLSZ); //unsigned, so if it goes negative, it goes BIG!!

            pthread_mutex_unlock(&serres.sermut);
            /*************************************************/

            //usleep(1000000);
            //tcflush(serres.serialFD, TCIOFLUSH);
            count++;
        }
        serres.statusTX = eSTOPPED;
        pthread_cond_signal(&serres.genDatSleep);
        pthread_exit(&serres.sigint);
}

/*
 *  pthread DatGen generates the data for serTx
 *  pthread SerTx consumes the data generated by DatGen and sends out the serial port
 */
int main(int argc, char **argv)
{
    pthread_t datGen, serRx, serTx;
    pid_t pid = getpid();
    int thrdstat = 0;


    Init(SPORT);
    signal(SIGINT, SIGHandler);
    pthread_create(&datGen, NULL, DatGen, NULL);
    pthread_create(&serTx,  NULL, SerTx,  NULL);


    //Wait for all the threads to close
    pthread_join(datGen,NULL);
    pthread_join(serTx,NULL);

    Deinit();
    printf("\n>>>> End <<<<< %d\r\n", pid);
    return 0;
}
This is generated and subsequently transmitted in bulk
packet = 
0x55, 0xaa, 0x37, 0xee, 0x22, 0x77, 0x4c, 0xbf, 0x8 , 0xad, 
0xeb, 0xc9, 0xa,  0xb2, 0x1d, 0x45, 0x57, 0x48, 0xc0, 0xc1, 
0xa3, 0x0,  0xb4, 0x73, 0x91, 0x8b, 0x28, 0x17, 0x3 , 0x40, 
0x62, 0x48, 0x86, 0xc7, 0x9e, 0x60, 0xc2, 0xea, 0x20, 0xca, 
0x98, 0x8c, 0x94, 0x22, 0xbe, 0x32, 0x67, 0x96, 0xf9, 0x28, 
0xd7, 0x1d, 0xa7, 0x8c, 0xff

This is received by the device.
packet = 
0x55, 0xaa, 0x37, 0xee, 0x22, 0x77, 0x4c, 0xbf, 0x8 , 0xad, 
0xeb, 0xc9, 0xd,  0xa,  0xb2, 0x1d, 0x45, 0x57, 0x48, 0xc0,  
0xc1, 0xa3, 0x0,  0xb4, 0x73, 0x91, 0x8b, 0x28, 0x17, 0x3, 
0x40, 0x62, 0x48, 0x86, 0xc7, 0x9e, 0x60, 0xc2, 0xea, 0x20,
0xca, 0x98, 0x8c, 0x94, 0x22, 0xbe, 0x32, 0x67, 0x96, 0xf9, 
0x28, 0xd7, 0x1d, 0xa7, 0x8c

Byte 22 (0 indexed) is incorrect. 0xD is in the received in place of 0xC9 and the last byte, 0xFF is not correctly received by the receiver. I failed to mention that there is no flow control and that is by design.

Update ...

All credit goes to: Craig Estey

Ok, I think I have resolution. The problem I was having was not properly setting up the serial socket. I found that my call to: cfmakeraw(&options); was in the wrong location. Thus, the OPOST option for c_oflags was not deasserted. As pointed out below in the answer update, when the kernel saw 0xA in my data, it was automatically sending 0xD. The corrected code no longer shows this behavior.

Secondly it was pointed out that there was a Race Condition in my producer-consumer relationship. I don't necessarily agree with the analysis but I looked at anyway and changed how full and empty were defined in my program. I think I was going to originally go this route but I was having issues with thread sychronization due to lost signals in pthreads....annoying... pthread lost wake ups: Lost wakeups in pthreads

Final Code that appears to work:

#include <unistd.h>
#include <sys/types.h>
#include <sys/syscall.h>
#include <fcntl.h>
#include <termios.h>
#include <errno.h>
#include <pthread.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <time.h>

#include <assert.h>


#define SPORT "//dev//serial0"
#define NETPAYLOADSZ 256

typedef union __attribute__((packed)) PACKET
{
    struct
    {
        uint8_t SID;
        uint8_t DID;
        uint8_t LEN;
        uint8_t SPT;
        uint8_t DPT;
        uint8_t TID;
        uint8_t DAT[249];//very last byte is the checksum
    };
    uint8_t packet[NETPAYLOADSZ];
}uPacket;

enum THRDSTAT { eNOTRUNNING, eRUNNING, eSLEEPING, eSTOPPED };

struct SerRes
{
    int sigint;
    int serialFD;
    uint16_t datInPoolIdx; //writen by GenDat, read by SerTx
    uint16_t datOutPoolIdx;//written by SerTx
    enum THRDSTAT statusGen;
    enum THRDSTAT statusTX;
    uint8_t genSig;
    uint8_t txSig;
    uint8_t StartCond;
    uint8_t dataAvail;

    pthread_mutex_t sermut;
    pthread_mutex_t genSleepmux;
    pthread_mutex_t serTxSleepmux;

    pthread_cond_t genDatSleep;
    pthread_cond_t serTxSleep;
};

#define DATPOOLSZ 16
struct SerRes serres;
uPacket stDatPool[DATPOOLSZ];

void SIGHandler(int num)
{
    struct termios options;

    if(serres.sigint == 1)
    {
        close(serres.serialFD);
        serres.statusGen = eSTOPPED;
        serres.statusTX = eSTOPPED;
        exit(-1);
        return;
    }
    tcgetattr(serres.serialFD, &options);
    options.c_cc[VTIME] = 1; //timeout of 100ms
    options.c_cc[VMIN]  = 0; //1 receive atleast 1 character

    tcsetattr(serres.serialFD, TCSANOW, &options);

    serres.sigint = 1;
}

void Init(char *serpath)
{
    struct termios options;

    memset(&serres, 0, sizeof(struct SerRes));
    memset(&stDatPool, 0, sizeof(uPacket*)*DATPOOLSZ);

    //serres.datInPoolIdx = 1; //starting condition
    serres.StartCond = 1;
    pthread_mutex_init(&serres.sermut, NULL);
    pthread_mutex_init(&serres.genSleepmux, NULL);

    pthread_cond_init(&serres.genDatSleep, NULL);
    pthread_cond_init(&serres.serTxSleep, NULL);


    serres.serialFD = open(serpath, O_RDWR | O_NOCTTY | O_NDELAY);
    if(serres.serialFD < 0)
    {
        printf("\nError no is: %d", errno);
        printf("\nError description: %s\n", strerror(errno));
        exit(-1);
    }


    tcgetattr(serres.serialFD, &options);
    cfmakeraw(&options);
    options.c_cflag = CS8 | CLOCAL | CREAD;
    options.c_lflag &= ~(ICANON | ECHO | ECHOE | ISIG );
    options.c_oflag &= ~OPOST;
    options.c_cc[VTIME] = 10; //timeout of 100ms
    options.c_cc[VMIN]  = 1; //1 receive atleast 1 character

    cfsetspeed(&options, B115200); //set both input and output speed

    tcsetattr(serres.serialFD, TCSANOW, &options);
}

void Deinit(void)
{
    close(serres.serialFD);
    pthread_mutex_destroy(&serres.sermut);
    pthread_mutex_destroy(&serres.genSleepmux);
    pthread_cond_destroy(&serres.genDatSleep);
}


void *DatGen(void *arg)
{
    int randDev = 0;
    uint8_t idx;
    uint8_t cksum = 0;

    pid_t tid = (pid_t)syscall(__NR_gettid);

    printf("\nDatGen - %d: Starting: \r\n", tid);
    serres.statusGen = eRUNNING;
    srand(0x089FFEE4);

    serres.datInPoolIdx = 1;// (serres.datInPoolIdx + 1) & (DATPOOLSZ - 1);

    while(serres.sigint == 0 && serres.statusGen != eSTOPPED)
    {

        //Sleep Condition
        pthread_mutex_lock(&serres.genSleepmux);
        if(serres.datInPoolIdx == serres.datOutPoolIdx) //full condition
        {
            printf("DatGen - %d: Sleeping - %d\r\n", tid, serres.dataAvail );
            serres.statusGen = eSLEEPING;
            serres.StartCond = 0;
            if(serres.statusTX == eSLEEPING)
            {
                serres.txSig = 1;
                pthread_cond_signal(&serres.serTxSleep);
            }

            while((serres.datInPoolIdx == serres.datOutPoolIdx) && (serres.genSig == 0)) //Gaurd against spurious wake up events
            {
                serres.genSig = 0;
                pthread_cond_wait(&serres.genDatSleep, &serres.genSleepmux);
            }
            serres.genSig = 0;
            printf("Datgen - %d: Wokeup\r\n", tid);

            if(serres.statusTX == eSTOPPED)
                    break;

            serres.statusGen = eRUNNING;
        }

        idx = serres.datInPoolIdx;
        assert(idx < DATPOOLSZ);
        serres.datInPoolIdx = (serres.datInPoolIdx + 1) & (DATPOOLSZ - 1);

        pthread_mutex_unlock(&serres.genSleepmux);

        //Time to wake up the SerTX thread? Maybe?

        //Generate the Packets.
        stDatPool[idx].SID = 0x55;
        stDatPool[idx].DID = 0xAA;
        stDatPool[idx].LEN = 0x30; //(rand() % 100);
        stDatPool[idx].SPT = 0xEE;
        stDatPool[idx].DPT = 0x22;
        stDatPool[idx].TID = 0x77;


        for(int i = 0; i < stDatPool[idx].LEN ; i++)
        {
            stDatPool[idx].DAT[i] = i; //rand() % 100; //Only Write
            cksum += stDatPool[idx].DAT[i];
        }

        stDatPool[idx].LEN += 7;

        cksum += stDatPool[idx].SID + stDatPool[idx].DID + stDatPool[idx].SPT + stDatPool[idx].LEN;
        cksum += stDatPool[idx].DPT + stDatPool[idx].TID;

        stDatPool[idx].packet[stDatPool[idx].LEN-1] = 0xFF;

        /********* Critical Shared Section *************/
        pthread_mutex_lock(&serres.sermut);

            serres.dataAvail++; //Touched by both threads...

        pthread_mutex_unlock(&serres.sermut);
        /*************************************************/

    }
    serres.statusGen = eSTOPPED;
    pthread_exit(&serres.sigint);
}

#define LOOPCOUNT 8
void *SerTx(void *arg)
{
    pid_t tid = (pid_t)syscall(__NR_gettid);
    uint8_t idx = 0;
    uint16_t randel = 0;
    uint32_t count = 0;
    uint8_t bytesSent = 0;

    serres.statusTX = eRUNNING;
    printf("SerTx - %d: Starting\r\n", tid);

    while(serres.sigint == 0 && serres.statusTX != eSTOPPED)// && count < LOOPCOUNT)
    {

        //Sleep Condition
        pthread_mutex_lock(&serres.genSleepmux);
        serres.datOutPoolIdx = (serres.datOutPoolIdx + 1) & (DATPOOLSZ -1);

        if((serres.datOutPoolIdx == serres.datInPoolIdx)) //Empty Condition, sleep on first start.
        {

            if(serres.statusGen == eSLEEPING)
            {
                    printf("Wake GenDat\r\n");
                    pthread_cond_signal(&serres.genDatSleep);
                    serres.genSig = 1;
            }

            printf("SerTx - %d: Sleep\r\n", tid);

            serres.statusTX = eSLEEPING;
            while((serres.datOutPoolIdx == serres.datInPoolIdx) && serres.txSig == 0) //Gaurd against spurious wakeup events.
            {
                    serres.txSig = 0;
                    pthread_cond_wait(&serres.serTxSleep, &serres.genSleepmux);

            }
            serres.txSig = 0;
            serres.statusTX = eRUNNING;
            printf("SerTx - %d: Running\r\n", tid);
        }

        idx = serres.datOutPoolIdx;
        assert(idx < DATPOOLSZ);


        pthread_mutex_unlock(&serres.genSleepmux);

        //Output

        if(stDatPool[idx].SID != 0x55)
                assert(stDatPool[idx].SID == 0x55);

        if(stDatPool[idx].DID != 0xAA)
                assert(stDatPool[idx].DID != 0xAA);

        if(stDatPool[idx].LEN != 0x37) //(rand() % 100);
                assert(stDatPool[idx].LEN == 0x37);

        if(stDatPool[idx].SPT != 0xEE)
                assert(stDatPool[idx].SPT == 0xEE);

        if(stDatPool[idx].DPT != 0x22)
                assert(stDatPool[idx].DPT == 0x22);

        if(stDatPool[idx].TID != 0x77)
                assert(stDatPool[idx].TID == 0x77);

        for(int i = 0; i < (stDatPool[idx].LEN-7); i++)
        {
                assert(stDatPool[idx].DAT[i] == i);
        }

        if(stDatPool[idx].packet[stDatPool[idx].LEN-1] != 0xFF)
                assert(stDatPool[idx].packet[stDatPool[idx].LEN-1] == 0xFF);

        /*  bytesSent = write(serres.serialFD, &stDatPool[idx].packet[0], stDatPool[idx].LEN); //only Read

        if(stDatPool[idx].LEN != bytesSent) //did we not send all the bytes?
        {
            printf("Pkt Len: %x\nBytesSent: %x\r\n", stDatPool[idx].LEN, bytesSent);
            assert(0);
        }
                */
        //printf("Consume: %d\r\n", stDatPool[idx].LEN);

        /********* Critical Shared Section *************/
        pthread_mutex_lock(&serres.sermut);

            memset(&stDatPool[idx], 0, sizeof(stDatPool[idx]));
            serres.dataAvail--; //shared write

        pthread_mutex_unlock(&serres.sermut);
        /*************************************************/

        //usleep(1000000);
        //tcflush(serres.serialFD, TCIOFLUSH);
        count++;

    }
    serres.statusTX = eSTOPPED;
    pthread_cond_signal(&serres.genDatSleep);
    pthread_exit(&serres.sigint);
}


/*
 *  pthread DatGen generates the data for serTx
 *  pthread SerTx consumes the data generated by DatGen and sends out the serial                                                                                          port
 */
int main(int argc, char **argv)
{
    pthread_t datGen, serRx, serTx;
    pid_t pid = getpid();
    int thrdstat = 0;


    Init(SPORT);
    signal(SIGINT, SIGHandler);
    pthread_create(&datGen, NULL, DatGen, NULL);
    pthread_create(&serTx,  NULL, SerTx,  NULL);


    //Wait for all the threads to close
    pthread_join(datGen,NULL);
    while(serres.StartCond == 1);
    pthread_join(serTx,NULL);

    Deinit();
    printf("\n>>>> End <<<<< %d\r\n", pid);
    return 0;
}


MadVocoder42
  • 83
  • 1
  • 7
  • Please provide a [complete minimal verifiable example](https://stackoverflow.com/help/minimal-reproducible-example) that can reproduces the problem. Describing code in words alone is almost never precise enough to diagnose problems. The problem may be in a totally different area that you have not described so showing the code is vital to someone else looking at the problem without any (possibly wrong) assumptions. – kaylum Feb 22 '21 at 21:09
  • Sure it isn't about every 256? What happens when your random byte is XOFF? – stark Feb 22 '21 at 21:16
  • Definitely need to see the code, including (especially) the setup of the serial port `tcsetattr` settings – Steve Friedl Feb 22 '21 at 21:59
  • @Kaylum I can post the program as-is after some clean up. To rewrite the progam from scratch can possibly change the context and the nature of the problem I'm seeing. I was more looking for any insight that might cause the problem, however, I'm happy to share what I have since its not anything more than a slightly more complex example program. – MadVocoder42 Feb 22 '21 at 21:59
  • Please _edit_ your question and post your code in a code block here. It is _very_ likely you have a race condition between your threads. This is _easily_ caused by scheduling issues for code [that you've written] that isn't thread safe. The _kernel's_ `write` syscall will _not_ inject anything because of scheduling. The only thing that would "interrupt" such a syscall would be a thread receiving a signal [and the return would be -1 and `errno` would be `EINTR`]. But, that is almost certainly _not_ what is happening. Once a `write` is issued [on serial port], _all_ bytes will be sent in order. – Craig Estey Feb 22 '21 at 22:00
  • Well at least the indenting isn't completely funky. – MadVocoder42 Feb 22 '21 at 22:29
  • *a Raspberry PI 3b+* Are assignments to `int` values atomic? The use of `serres.sigint` as a flag is extremely problematic - there's no guarantee that a change made by the signal handler to the value in `serres.sigint` is visible or will be acted on by any of the threads with that value supposedly being checked by a `while()` condition - the compiler is free to assume that the value in `serres.sigint` never changes. See [C11 **5.1.2.3 Program execution**, paragraph 5](https://port70.net/~nsz/c/c11/n1570.html#5.1.2.3p5). – Andrew Henle Feb 22 '21 at 23:51
  • @AndrewHenle The serres.sigint doesn't work well in practice, I was trying to break everything cleanly. It works some of the time and well enough that I can get the program to exit even if its not clean. It can be better executed. – MadVocoder42 Feb 23 '21 at 00:02
  • 1
    Just dont use `__attribute__((packed))` :: you don't need it. It is ugly, anyway. – wildplasser Feb 23 '21 at 00:43

1 Answers1

1

Edit: Thank you for editing your question and posting your generated data and the corresponding actual data received at the remote device.

Your problem [and solution] is much simpler. See the UPDATE #2 section below.


You are accessing your ring queue index variables (e.g. datInPoolIdx and datOutPoolIdx) outside of a locked region.

I admire all the work you put into this. But, I think you've got a bit too much complexity.

You really only need to access the index variables under lock.

Loosely ...

Tx thread should sleep if the ring queue is empty:

enqidx == deqidx

Gen thread should sleep if the ring queue is full:

((enqidx + 1) % DATPOOLSIZ) == deqidx

The pthread_cond_wait/pthread_cond_signal conditions should be based on a comparison of these values using something like the above.


Here's some pseudo code, based loosely on what you have. It doesn't have any condition variables, but I think you'll get the idea of how to add that if needed.

These functions only do byte-at-a-time. But, there is a way to modify them, so they produce a length and number of contiguous bytes either free or available so you can use memcpy to move a bunch of bytes into/out of the queue in bulk.

int
queue_wrap_idx(int idx)
{

    idx += 1;
    idx %= DATPOOLSIZ;

    return idx;
}

// gen_avail -- space available in queue
// RETURNS: index of place to store (or -1=full)
int
gen_avail(void)
{

    lock();
    int idxenq = datInPoolIdx;
    int idxdeq = datOutPoolIdx;
    unlock();

    int idxfull = queue_wrap_idx(idxenq);

    if (idxfull == idxdeq)
        idxenq = -1;

    return idxenq;
}

// gen_advance -- advance generator queue index
void
gen_advance(void)
{
    lock();

    int idxenq = datInPoolIdx;
    idxenq = queue_wrap_idx(idxenq);
    datInPoolIdx = idxenq;

    unlock();
}

// tx_avail -- data available in queue
// RETURNS: index of place to dequeue (or -1=empty)
int
tx_avail(void)
{

    lock();
    int idxenq = datInPoolIdx;
    int idxdeq = datOutPoolIdx;
    unlock();

    if (idxdeq == idxenq)
        idxdeq = -1;

    return idxdeq;
}

// tx_advance -- advance transmitter queue index
void
tx_advance(void)
{
    lock();

    int idxdeq = datOutPoolIdx;
    idxdeq = queue_wrap_idx(idxdeq);
    datOutPoolIdx = idxdeq;

    unlock();
}

// gen_thread -- data generator thread
void
gen_thread(void *ptr)
{

    while (1) {
        int idxenq = gen_avail();

        if (idxenq >= 0) {
            DAT[idxenq] = rand();
            gen_advance();
        }
    }
}

// tx_thread -- serial port transmit thread
void
tx_thread(void *ptr)
{

    while (1) {
        int idxdeq = tx_avail();

        if (idxdeq >= 0) {
            char datval = DAT[idxdeq];
            tx_advance();
            write(serport,&datval,1);
        }
    }
}

UPDATE:

I can certainly change it, no biggie, but I don't think its the problem I'm experiencing.

You definitely have race conditions, as I've mentioned. Unfixed/latent race conditions appear very much like glitchy H/W. Until you fix these issues, you can't speculate further.

If I was overwriting the transmitted memory region via the GenDat thread then I would expect to see fragments on the receiving side. What I'm seeing is a single incorrect byte that is affecting my packaging of data.

Build a diagnostic mode [or two] ...

Just have GenThread send an incrementing stream of bytes (vs. rand/whatever). In this mode, skip the UART TX altogether. It is easy for the TxThread to notice a gap [because it should see the sequence 0x00-0xFF repeated indefinitely].

This will run the threads at a much faster rate and be much more likely to show up race conditions.

Remove all [debug] printf. They have locks that can perturb the timing so you're not measuring your [real] system, but the system with the printf. The printf calls are slow and mess things up.

You can add random nanosleep calls to stress the setup further. Be creative. Use the dark side of the force to create a test that is much worse than what the real system would experience.

You can have the GenThread deliberately send an out-of-sequence byte to verify that the TxThread can detect a gap.

When you've got all that working, run the diagnostic mode for a day or so to see what happens. In my experience, you'll see something usually within a few minutes to an hour. When I was doing testing, I'd run it overnight as an acceptance test.

It looks like the uart TX hardware is glitching, is it hardware? LOL yes!!(i'm software) but its most definitely software...somewhere.

Hmm ... Unlikely. I have direct [commercial/product] experience with UART TX on an RPi.

You might have inconsistent setup of RTS/CTS between sender system and receiver system. Or, the Tx/Rx clock frequencies / baud rates are slightly off.

The remote system might be overrunning the buffer (i.e.) it can't keep up with the [burst] data rate.

The receiver might be slow in processing the data. You should use hires timestamping (e.g. clock_gettime(CLOCK_MONOTONIC,...) to mark byte arrival, etc.

What I do for that is have a "trace buffer" that records "event" type and timestamp. Unfortunately [for you ;-)], I use a ring queue to record these events [using stdatomic.h functions such as atomic_compare_exchange_strong] to add/remove the trace entries.

So, you'd need a solid multithread ring queue implementation to save the trace entries [herein, a chicken-and-the-egg problem]. If the receiver is something like FreeRTOS, you'll have to deal with cli/sti and other bare metal considerations.


UPDATE #2:

I wrote a small perl script to analyze and compare your data. This analysis could also be done by creating two files that have the two digit hex values, one per line. Here is the diff -u output:

--- tx.txt  2021-02-23 10:38:22.295135431 -0500
+++ rx.txt  2021-02-23 10:38:22.295135431 -0500
@@ -10,6 +10,7 @@
 AD
 EB
 C9
+0D
 0A
 B2
 1D
@@ -52,4 +53,3 @@
 1D
 A7
 8C
-FF

The data received at the device is identical to the generated sequence except when the generated data byte is:

0A

The receiver gets:

0D 0A

The host kernel TTY layer is sending <CR><LF> when it sees <LF>.

This is because when you were setting up the termios parameters, you did not set up "raw" mode correctly.

Specifically, you didn't disable "implementation-defined output processing" [per the example in man termios].

That is, in Init, you need to add:

options.c_oflag &= ~OPOST;
Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • In my original implementation, I was sleeping the GenDat thread when the In and Out idx'es were equal. But then you run into the conundrum of what is a full condition and what is an empty condition. With serres.dataAvail it is plainly obvious in what condition you are in. This is my first pthreads app and I am used to doing everything superloop. My understanding of a critical region, is a region where 2 or more threads make decision on a shared resource. Basically, serres.dataAvail is the most critical region. – MadVocoder42 Feb 23 '21 at 00:13
  • I've done a number of multithread ring queue impl for commercial mission critical R/T systems/products. On my 1st impl [18+ years ago], I had separate flags for empty/full [for about a week] until I realized they were the source of my problems. You're racing between (e.g.) `dataAvail` and the queue indexes being set. There isn't really a conundum: full/empty detection is as I described above. – Craig Estey Feb 23 '21 at 00:36
  • I can certainly change it, no biggie, but I don't think its the problem I'm experiencing. If I was overwriting the transmitted memory region via the GenDat thread then I would expect to see fragments on the receiving side. What I'm seeing is a single incorrect byte that is affecting my packaging of data. It looks like the uart TX hardware is glitching, is it hardware? LOL yes!!(i'm software) but its most definitely software...somewhere. – MadVocoder42 Feb 23 '21 at 00:40
  • After I added that update to my post, I was pondering about that byte. I am still working through your suggestions. I will look into adding ``` options.c_oflag &= ~OPOST``` If that is a the case then I can see about crafting a packet with only 0xA and I should see 0xD on the receiver. – MadVocoder42 Feb 23 '21 at 17:35
  • If you add `~OPOST`, and your gen buffer is _only_ `0A` the receiver will _only_ see `0A` [_not_ `0D`]. With `OPOST` on [the default], the TTY layer will convert `\n` into `\r\n` because most ASCII/UART _terminals_ will expect that. You can play with this with `stty` What is the "device" on the other end? – Craig Estey Feb 23 '21 at 19:13
  • The device is an embedded device that I'm developing. All it is doing is taking traffic from one interface and forwarding it to another. Its intent is to support legacy equipment on new networks. Its just a stupid serial data input, different iface output. My usual development platform is embedded systems and Linux is still new to me in this regard. – MadVocoder42 Feb 23 '21 at 19:22
  • What chip/platform (e.g. arduino, stm*, PIC*, arm, etc.) and what OS (e.g. FreeRTOS, QNX, RTEMS, bare metal)? – Craig Estey Feb 23 '21 at 19:30
  • Its baremetal NXP. The exact nature isn't really what I'm trying to discover here. I'm trying to determine what that spurious 0xD is and you might have given me the answer. Though, I'm chasing that race condition that you said exists in my program. I'm not entirely convinced its there because Outidx and InIdx are only touched in their single threads where dataAvail is used by both. Also, my 'cfmakeraw(&options);' call is in the wrong order. I looked at the man page and that fnct call sets the ~OPOST in oflags. It would if I didn't overwrite my ser opts in the first place. – MadVocoder42 Feb 23 '21 at 19:39
  • I did reference my sleep conditions to OutIdx and InIdx but I keep hitting an issue with pthreads missing the signal call to wake up a thread. – MadVocoder42 Feb 23 '21 at 19:40
  • I've had trouble with `cfmakeraw`, so I've usually rolled my own. For some `termios` code, see my answer: https://stackoverflow.com/questions/62668941/how-to-read-serial-with-interrupt-serial/62670017#62670017 – Craig Estey Feb 23 '21 at 20:07