4

I'm trying to get libwebsockets running in a multithreaded environment on OS X. I couldn't trigger sending Data from a different thread than the main service thread. On libwebsocket docs it was implied this should be possible (demo code, mailinglist). So I dug into the code and found the problem in the poll() function.

It seems that poll() is behaving differently concerning the struct pollfd that is given as parameter. libwebsockets is relying on the possibility to change the fds.event fields while poll() is active. This is working fine on Linux but is not working on OS X.

I wrote a small test program to demonstrate the behaviour:

#include <unistd.h>
#include <netdb.h>
#include <poll.h>
#include <iostream>
#include <thread>

#define PORT "3490"

struct pollfd    fds[1];
bool connected = false;

void main_loop() {
    int sockfd, new_fd; 
    struct addrinfo hints, *servinfo, *p;
    socklen_t sin_size;
    int yes=1;
    char s[INET6_ADDRSTRLEN];
    int rv;

    memset(&hints, 0, sizeof hints);
    hints.ai_family = AF_INET;
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags = AI_PASSIVE; 

    if ((rv = getaddrinfo(NULL, PORT, &hints, &servinfo)) != 0) {
        fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(rv));
        return;
    }

    for(p = servinfo; p != NULL; p = p->ai_next) {
        if ((sockfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) == -1) {
            perror("server: socket");
            continue;
        }

        if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1) {
            perror("setsockopt");
            exit(1);
        }

        if (bind(sockfd, p->ai_addr, p->ai_addrlen) == -1) {
            close(sockfd);
            perror("server: bind");
            continue;
        }

        break;
    }

    freeaddrinfo(servinfo);

    if (p == NULL)  {
        fprintf(stderr, "server: failed to bind\n");
        exit(1);
    }

    if (listen(sockfd, 10) == -1) {
        perror("listen");
        exit(1);
    }

    printf("server: waiting for connections...\n");

    new_fd = accept(sockfd, NULL, &sin_size);
    if (new_fd == -1) {
        perror("accept");
        return;
    }

    fds[0].fd = new_fd;
    fds[0].events = POLLIN;
    connected = true;

    printf("event is %i\n", fds[0].events);
    int ret = poll(fds, 1, 5000);
    printf("event is %i\n", fds[0].events); //expecting 1 on Mac and 5 on Linux

    if (send(new_fd, "Hello, world!\n", 14, 0) == -1)
        perror("send");

    close(new_fd); 
    close(sockfd);
}

void second_thread()
{
    while(connected == false){}
    sleep(1);
    fds[0].events = POLLIN|POLLOUT;
    printf("set event to %i\n", fds[0].events);
}

int main() {

    std::thread t1(main_loop);
    std::thread t2(second_thread);

    t1.join();
    t2.join();

    return 0;
}

Compile on OS X using clang++ -std=c++11 -stdlib=libc++ -o poll poll.cpp and on Linux using g++ -std=c++11 -pthread -o poll poll.cpp

The program starts listening on port 3490. If you connect to it (e.g. using netcat localhost 3490) it will poll for input on the main thread and try to change the event flags in the second thread. It will exit after 5 seconds.

The output on OS X:

server: waiting for connections...
event is 1
set event to 5
event is 1

The output on Linux:

server: waiting for connections...
event is 1
set event to 5
event is 5

So my question is: is there any documentation available that explains this behavior? Is it safe what libwebsockets is doing in expecting that it is legal to change fds.events while poll is active? I couldn't find any details about it in the manpages (OS X, Linux).

heine
  • 599
  • 5
  • 18
  • Since there are no memory barriers, could it [simply] be thread-visibility differences? (If so, I would hesitate to consider either behavior 'well defined'.) – user2864740 Aug 24 '15 at 21:35
  • [poll vs select vs event-based](http://daniel.haxx.se/docs/poll-vs-select.html) – Elliott Frisch Aug 24 '15 at 21:42
  • @user2864740 I thought about that too. But at least changing another variable in one thread and printing it on the other is working fine on both OSes. – heine Aug 24 '15 at 21:46
  • 1
    Using the same variable in two threads without a mutex is asking for trouble. – Barmar Aug 24 '15 at 21:51
  • @ElliottFrisch I've seen this page before but just noticed this sentence: "poll() doesn't destroy the input data". So the assumption that libwebsockets does is correct and OS X is behaving weird. Is this the right conclusion? – heine Aug 24 '15 at 21:52

1 Answers1

5

You seem to say, at first, that you found some documentation that claims that this is supported and defined behavior. I'd be curious to know where you read that, because I am unable to find anything in either the Linux man page for poll(2), nor in the POSIX man page for poll() that documents that a different thread can actually change the values in the event array argument that another thread passed to poll(), and have the different thread's changes actually take effect in the original thread's poll() call, irrespective of any issues relating to memory barriers, and such.

Both man pages appear to be completely silent, to me, on this subject matter. They do not indicate whether this is expected, supported, or defined behavior; or whether this is not a supported or defined behavior.

The proposition that a different thread can modify the parameters to a system call issued by another thread, after -- AFTER -- the other thread has already entered the syscall, seems rather counter-intertuitive to me. If this is supported behavior, I would expect it to be explicitly documented, and I can't find any reference to it in the Linux or the POSIX man pages.

Having said that: even if I limit the scope of my software to Linux, even if I don't need to care about other platforms; given the absence of any documentation of this, and even if my testing showed the Linux kernel implementing poll(2) this way, I would not expect to have any guarantees that some future kernel version will continue to behave this way. I would not be able to rely on this behavior, except on the specific kernel build I tested this with.

So, to answer your question: the only documentation that's authoritative on this topic are the man pages in question. They do not explicitly document this as legal behavior; and although they do not explicitly say that this is illegal behavior either, for the reasons stated above, I would consider this to be unsupported, undefined behavior.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • Maybe I was a bit unclear in my question. The first part is refering to libwebsocket. There are multiple points in demo code and mailinglist postings where the author says it should be safe to send data from a different thread ([demo code](http://git.libwebsockets.org/cgi-bin/cgit/libwebsockets/tree/test-server/test-server.c#n490), [mailinglist](http://ml.libwebsockets.org/pipermail/libwebsockets/2015-April/001732.html)). But this is leading to the behaviour I point out in my example code. Your posting is basically supporting my assumption that there is undefined behaviour involved. – heine Aug 25 '15 at 07:10
  • Correct. The claim that this is documented, legal behavior is clearly wrong. There's nothing in either Linux nor Unix man pages that explicitly documents this as a defined behavior with expected results. The argument has to be reduced to "well, it's not explicitly documented that this does not work, so it must be allowed". Everyone is free make their own decisions, but I would not base my own code on such an assumption. P.S. I see nothing in the linked mailing list message that claims that this abuse of poll() is valid. – Sam Varshavchik Aug 25 '15 at 10:58
  • He's not claiming this directly. The quote is "That - alone - should be safe from other thread contexts." which is referring to calling "libwebsocket_callback_on_writable()" which manipulates fds.events while it might be in use by poll() in another thread. As libwebsocket itself is not claiming to be threadsafe but implying "it might work if you do this and that" this is probably okay. So I guess I found the case where it is not working at all and as you said it might break if the implementation of poll() under linux changes as it is undefined behaviour which you shouldn't rely on. – heine Aug 25 '15 at 11:21