1

I have a Raspberry Pi connected via UART to a microcontroller. The code on the RPI is trying to read incoming non-canonical UART data, but randomly receives POLLHUP. I have been able to recover by closing and reopening the file, but this is less than ideal.

Is there a way to disable the disconnect detection behavior of termios in Linux? I am not sure why the POLLHUP is being raised in the first place. I suspect that some control characters are still being interpreted despite my call to cfmakeraw(). The cable is unlikely to be the problem as canonical debug output works fine (admittedly over different pins, but same baud and same type of cable).

Sample code, setup:

bool UartSocket::setup()
{
    int fd = ::open("/dev/serial0", O_RDWR | O_NOCTTY | O_NONBLOCK);
    if (fd == 0)
    {
        return false;
    }

    struct termios portSettings;
    ::memset(&portSettings, 0, sizeof(portSettings));
    if (m_rSyscalls.tcgetattr(fd, &portSettings) != 0)
    {
        m_logger.error("tcgetattr() failed, errno = %d.", errno);
        return false;
    }
    m_rSyscalls.cfsetispeed(&portSettings, 115200);
    m_rSyscalls.cfsetospeed(&portSettings, 115200);
    cfmakeraw(&portSettings);

    // Fiddling with more settings out of desperation
    portSettings.c_iflag &= ~IGNBRK; // disable break processing
    portSettings.c_lflag &= ~ICANON;
    portSettings.c_cc[VEOF] = 0;

    if (m_rSyscalls.tcsetattr(fd, TCSANOW, &portSettings) != 0)
    {
        m_logger.error("tcsetattr() failed, errno = %d.", errno);
        return false;
    }

    // Prepare to poll on recv() calls
    m_pollfd.fd = fd;
    m_pollfd.events = POLLIN;

    return true;
}

Sample code, Rx:

ssize_t UartSocket::recv(char* buf, size_t maxRead)
{
    ssize_t readResult = -1;

    int pollResult = ::poll(&m_pollfd, 1, 1000);
    if (pollResult > 0)
    {
        if (m_pollfd.revents & POLLERR)
        {
            int error = 0;
            socklen_t errlen = sizeof(error);
            if (getsockopt(
                        fd,
                        SOL_SOCKET,
                        SO_ERROR,
                        static_cast<void*>(&error),
                        &errlen))
            {
                m_logger.error(
                        "getsockopt failed when trying to diagnose an error.");
            }

            m_logger.error(
                    "Error on uart %s. Error = %d, len = %u.",
                    m_rConfig.getPath().c_str(),
                    error,
                    errlen);
            return -1;
        }

        if (m_pollfd.revents & POLLIN)
        {
            readResult = ::read( //
                    fd,
                    buf,
                    maxRead);
            m_logger.info("readResult = %d.", readResult);
            if (readResult > 0)
            {
                 // Party, we are happy
                 return readResult;
            }
            else if (readResult == 0)
            {
                // empty read..no-op
                m_logger.dump("Got an empty UART read.");
            }
            else
            {
                if (errno == EAGAIN)
                {
                    // No data was available to read; do nothing.
                    readResult = 0;
                    m_logger.dump("Got an empty UART read.");
                }
                else
                {
                    m_logger.error(
                            "Failure reading uart %s, errno = %d.",
                            m_rConfig.getPath().c_str(),
                            errno);
                }
            }
        }

        // We wait for the buffer to empty before handling any hangups
        if ((m_pollfd.revents & POLLHUP) && (readResult == 0))
        {
            m_logger.error("Hangup on uart %s.", m_rConfig.getPath().c_str());
            reopen(); // closes the fd, reopens it and repeats the termios setup
        }
    }
    else if (pollResult == 0)
    {
        // No data was available to read; do nothing.
        readResult = 0;
        m_logger.dump("Got an empty UART poll.");
    }
    else
    {
        m_logger.error("Failure polling uart 0, errno = %d.", errno);
        readResult = -1;
    }
    return readResult;
}

TL;DR: The code above has a branch which handles POLLHUP by closing and reopening the serial device. I am talking to a device that sends raw bytes and would prefer it if termios in Linux does not make the file descriptor unusable in case of POLLHUP. Ideally they should also entirely ignore whatever control character is causing this, if it is a control character. Is there a way to do this?

0andriy
  • 4,183
  • 1
  • 24
  • 37
Fadeway
  • 549
  • 5
  • 18
  • 1
    The simple solution for the POLLHUP problem is to not use **poll()**. You have a multitasking OS that is event-driven (e.g. use HW interrupts), but your program counteracts that by using nonblocking mode and wasting CPU cycles by polling the system receive buffer. *"Fiddling with more settings out of desperation"* -- Basic configuration for termios raw mode is in [this answer](https://stackoverflow.com/questions/12437593/how-to-read-a-binary-data-over-serial-terminal-in-c-program/12457195#12457195). BTW in Linux, you're accessing a *serial terminal*, which is a few layers above a UART. – sawdust Sep 07 '21 at 02:18
  • @sawdust I am only using `poll()` as a guard so I don't have a permanently blocked, non-abortable thread stuck in `read()`. But I'll try it out and if it works then great. – Fadeway Sep 07 '21 at 05:45
  • @sawdust I used your code and it has brought me closer to the problem - the blocking read sometimes returns 0, which seems to mean EOF. My initialization is now the same as your recommendation, but just in case tomorrow I will try sending \x04\n to see if my 'raw' mode is reacting to EOF. If that turns out not to be the problem, do you have any guesses as to what else might cause a blocking read on raw tty to return 0? – Fadeway Sep 08 '21 at 21:07
  • 1
    *"the blocking read sometimes returns 0"* -- Then you are not faithfully using my example code. VMIN>0 and VTIME>0 will not return 0. Raw mode ignores the VEOF character. Post your revised code for review. Also study https://stackoverflow.com/questions/25996171/linux-blocking-vs-non-blocking-serial-read/26006680#26006680 Note that nonblocking mode causes VMIN and VTIME to be ingnored. – sawdust Sep 08 '21 at 22:30
  • After some trial and error it turned out that I was improperly setting the baud rate. I still have some errors (maybe for a future question..) but fixing the baud rate resolved the POLLHUP. Thanks for the sticking around @sawdust, your code was really helpful. – Fadeway Sep 13 '21 at 10:32
  • As for the read returning 0 -> errno was getting set to "Input/output error" and/or "Resource temporarily unavailable", and tests with special characters didn't cause a premature return. So I think the blocking read is properly set up and it's a different issue. But I didn't want to stray too much from the initial question. I'll make another question with a cleaner example based on your code if I can't resolve it :) – Fadeway Sep 13 '21 at 10:47

1 Answers1

0

The POLLHUP issue was resolved by setting the baud rate correctly.

My original code had a call to cfsetispeed(&portSettings, 115200);. This is wrong, B115200 needs to be passed instead. B115200 is a constant that usually resolves to something unpredictable (example).

I recommend not copying from my code but rather using this example for a basic raw tty setup.

Fadeway
  • 549
  • 5
  • 18