1

I'm using a radio module XBee pro and I log some radio data.

I use a FTDI serial to USB converter so the module appear under /dev/ttyUSB0.

I've written this code :

void TsToCoord::serialConfig()
{
    // Open Serial Port
    cout << "Opening serial port..." << endl;
    fd = open("/dev/ttyUSB0", O_RDWR | O_NOCTTY);

    if (fd < 0 )
    {
        cout << "Error " << errno << " opening /dev/ttyUSB0: " << strerror(errno) << endl;
    }
    else
    {
        //Configure Serial Port
        cout << "Configuring serial port..." << endl;
        struct termios tty;
        memset (&tty, 0, sizeof tty);

        if (tcgetattr (fd, &tty) != 0)
        {
            cout << "Error " << errno << " from tcgetattr: " << strerror (errno) << endl;
        }

        cfsetispeed(&tty, B57600);
        cfsetospeed(&tty, B57600);

        tty.c_cflag &= ~PARENB;
        tty.c_cflag &= ~CSTOPB;
        tty.c_cflag &= ~CSIZE;
        tty.c_cflag |= CS8;
        tty.c_cflag &= ~CRTSCTS;
        tty.c_lflag = 0;
        tty.c_oflag = 0;
        tty.c_cc[VMIN] = 1;
        tty.c_cc[VTIME] = 50;

        tty.c_cflag |= CREAD | CLOCAL;

        cfmakeraw(&tty);

        tcflush(fd, TCIFLUSH);

        if (tcsetattr(fd, TCSANOW, &tty) != 0) 
        {
            cout << "Error " << errno << " from tcsetattr" << endl;
        }
    }
}


void TsToCoord::listenPort()
{
    // Creation of a buffer to store data from radio module
    fill_n(buff, 2048, '\0');
    this-> ind = 0;

    while(true)
    {
        char mes[1024];
        fill_n(mes, 1024, '0');
        //cout << "Blocking read" << endl;
        int rd = read(fd, &mes, sizeof(mes));

        if (rd > 0)
        {
            //cout << "Storing in buffer" << endl;
            storeInBuff(mes, rd);
            fill_n(mes, 1024, '0');

            struct pollfd fds;
            fds.fd = fd;
            fds.events = POLLIN | POLLPRI;
            int slct = 1;

/*
            int slct = 1;
            fd_set rdfds;
            FD_ZERO(&rdfds);
            FD_SET(fd, &rdfds);
            struct timeval to;
            to.tv_sec = 0;
            to.tv_usec = 100000;
*/
            //fd_set rdfdsCopy = rdfds;
            //cout << "Entering second while loop" << endl;
            while (slct > 0)
            {
                //cout << "Call to select" << endl;
                //slct = select((fd+1), &rdfdsCopy, NULL, NULL, &to);
                slct = poll(&fds, 1, 100);
                if (slct > 0)
                {
                    //cout << "Next call to read, would not block" << endl;
                    rd = read(fd, &mes, sizeof(mes));
                    storeInBuff(mes, rd);
                    //rdfdsCopy = rdfds;
                }
            }
            findFrame(0);
            ind = 0;
            fill_n(buff, 2048, '\0');
        }
    }
}

My problem is that when it's launched it works perfectly. But after like 20 min, it does not do its work anymore. The CPU usage go to like 100% so the read call doesn't seem to block anymore. It's like the file descriptor isn't linked to the device anymore...

Since I absolutely doesn't know the reason and the bug and since it takes a random amount of time before crashing, I can't just un-daemonize it and watch the output in my terminal...

So I want to ask :

  • Is there a big error I didn't found in the code.
  • What can be the reason that makes it crash? I thought about another software that tries to use the same device but it doesn't seem to be the case. I don't think it's a baud rate problem either.

I added a check for the case where the radio is unplugged, so the program can't exit from its own. I'm pretty sure it's not the problem since the module stayed under /dev/ttyUSB0 after the crash occurs.

I use Debian 3.2.57-3 i686.

I don't have any problems with other software when using them with my radio module.

I seem to not have problems using this code on another similar computer...

Thanks for reading and sorry for a not so good English.

EDIT : to be more precise on what I want from this post and from this program : At some point the program fail to block on read and every toher calls to read don't block and don't read anything so the program doesn't do what it was created for : log data coming from radio module. I simply want to avoid it since without this it works perfectly fine and it could cause harm to hardware.

rmilville
  • 21
  • 5
  • @Olaf Like I said at some point the program fail to block on read and every toher calls to read don't block and don't read anything so the program doesn't do what it was created for : log data coming from radio module. I simply want to avoid it since without this it works perfectly fine and it could cause harm to hardware. – rmilville Jul 20 '16 at 12:19
  • @Kyll: OP showed no effort providing the reuqired things. So he at least can find the links himself. (And the links will be shown anyway with the close message; I'm not required to leave a comment at all (and for this question it is obvious), so take it as a courtesy and don't tell me what I should do or not! – too honest for this site Jul 20 '16 at 12:22
  • @Kyll thanks. I cut the code so just serial related things appear. I always think that maybe there's some other evident errors elsewhere so I put all the code. Since I didn't found people that had this problem (working at first and then randomly crash), I thought there was a more fancy error whithin it. My bad for posting just for a test. – rmilville Jul 20 '16 at 12:50
  • @rmilville Thank you for your edit and attention! I retracted my close vote. Hope you'll find a solution to your issue. Once it's solved, make sure to edit your post again to maximize its visibility and usefulness to future readers (or even post a self-answer describing how you solved this issue). – Kyll Jul 20 '16 at 12:55
  • A working digital computer system **very** rarely had **random** crashes. – too honest for this site Jul 20 '16 at 13:05
  • Your initialization code is so close to being POSIX compliant, except for 2 statements: `tty.c_lflag = 0` and `tty.c_oflag = 0`. You can delete those two because they are superfluous when you use **cfmakeraw()**. So is the **memset()** unneccessary. – sawdust Jul 21 '16 at 02:37
  • @sawdust I did find a way to make it work but thanks for your advice, I'll do it. Is it possible that what you pointed makes some things in this code to fail? – rmilville Jul 21 '16 at 07:57
  • In theory there could be something hiding in the termios structure (that's the reason for following the guidelines), but it's not likely to make a difference. Also `read(fd, &mes, sizeof(mes))` should be `read(fd, mes, sizeof(mes))`, that is `mes` is an address, and `&mes` is the address of an address. – sawdust Jul 21 '16 at 08:29

2 Answers2

1

A quick look at your code -- if you get an EOF or error on read (ie. read() returns zero or -1) you will loop forever. I can see you're in raw mode but I have see various bugs in the FTDI drivers and firmware which could cause that to happen, and it isn't a case you're handling.

janm
  • 17,976
  • 1
  • 43
  • 61
  • Thank you. I'll add a test on what read returns. The thing is if read return 0 or -1, why all the next calls would return 0 or -1 too? Since I want this thing to run as a daemon, if each time a read call makes all other call fail, I will have to restart the daemon. The thing is that I sometimes have to log data every 5 seconds, a restart every like 30min isn't really acceptable because it pretty long to restart. Do firmwares/ftdi drivers can cause this much bug? – rmilville Jul 20 '16 at 12:15
  • Once you get 0 or -1, that's probably what you'll keep getting. Once you get EOF no new data will arrive. When there's an error it won't go away. You need to figure out what's happening and address that problem. In your code you might be able to close and reopen the device. Also -- No good reason for a restart to be slow, should be milliseconds. – janm Jul 20 '16 at 12:25
  • That's what I did, close the fd and reopen the file corresponding to the device. Well, for the restart, if I kill with kill command et restart and start it with "service ----- start" it's fast but with a "service ---- restart" it's slow as hell. Since I added a test, I can hope that I'll don't need to restart the daemon. – rmilville Jul 20 '16 at 12:37
  • You have multiple problems here. Your code goes into a loop when there's a problem; it doesn't close the fd in that case. That's a bug in your code. Some FTDI devices do not handle ~CRTSCTS correctly; it may be that the read is blocking; a device driver bug. You can test this by changing your cable to connect RTS to CTS and connecting DTR, DSR and CD, then see if you still have the problem. "Taking forever" on kill is consistent when a device driver issue not returning. – janm Jul 20 '16 at 12:46
  • Sorry I didn't explain myself properly. When I said that I added a test for EOF or error, that was not in the post but on my computer to test once again. If all works properly I'll say it. I don't know much about rs232 connection so I'll search about what you point to me thanks. – rmilville Jul 20 '16 at 13:08
0

So I followed @janm hints. I've deleted this line :

tty.c_cflag &= ~CRTSCTS;

and did a test like this :

rd = read(fd, &mes, sizeof(mes));
if (rd > 0)
{
     doSomeStuff();
}
else
{
     close(fd);
     serialConfig();
     listenPort();
 }

Maybe when read fails I lose some data but at least the program can't be down and don't have to be restarted manually.

rmilville
  • 21
  • 5
  • You need to get to the root cause, meaning you need to check for `rd < 0` and then report the value of `errno`. E,G see http://stackoverflow.com/questions/6947413/how-to-open-read-and-write-from-serial-port-in-c/38318768#38318768 I fail to see how removing `tty.c_cflag &= ~CRTSCTS` could improve/fix anything. FWIW I have several FTDI adapters. – sawdust Jul 21 '16 at 08:34