2

I'm trying to understand why my CPU pegs to 100% when I use a FIFO in conjunction with g_io_channel.

I have a project on github that demonstrates the problem with the smallest possible setup; just a simple server that opens a FIFO in /tmp, and a client that sends a message to the server via that FIFO.

I found:

  1. When first started, the server takes close to zero CPU
  2. As soon as the client sends a message to the server via the FIFO, the message is received and printed by the server, and then the CPU goes to 100%.
  3. You can continue to send messages via the FIFO and the server will print them, but that CPU stays at 100% :-/

I've tried the usual google, and stack overflow, but so far no joy on finding a solution. I hope someone can help me understand what's going on. I believe I am using glib/GTK correctly, but I'm quite happy to stand corrected. I appreciate any help you can provide. Thanks!

liberforce
  • 11,189
  • 37
  • 48
bp9
  • 23
  • 2
  • 1
    Have you tried profiling the application using a profiling tool (or just interrupting it with `gdb` a few times and seeing what the backtrace looks like)? – Philip Withnall May 01 '18 at 17:55
  • I reproduced your problem and indeed the program is stuck in the GTK+ main loop. You should install the debugging packages for GTK+ on your distro and run your server in gdb to check what's going on. – liberforce May 02 '18 at 11:52

1 Answers1

1

By running the server using strace, I saw it was just calling poll() in a loop while returning the POLLHUP event. The problem seemed to be that you expect to be able to wait indefinitely for G_IO_IN events, to tell you data is coming, but don't manage G_IO_HUP that tell you that the connection was closed.

I then found this comment in bluez code:

if (cond & (G_IO_ERR | G_IO_HUP)) {
        /*
         * Both ends needs to be open simultaneously before proceeding
         * any input or output operation. When the remote closes the
         * channel, hup signal is received on this end.
         */
        fifo_open();
        return FALSE;
}

This SO question explains it all: Poll() on Named Pipe returns with POLLHUP constantly and immediately

And that's what I saw with strace: when the client process exit (yours or just a simple echo to your fifo), the IO channel is notified that the other side of the connection hanged up. Your client currently sends a message and closes the connection. So either you want it to keep the connection alive and not disconnect, or you need to reopen the fifo server-side and add a new watch, or as says mark4o in the SO question I linked, you open the fifo with read/write rights server-side so that you always have at least one writer on the fifo (the server), avoiding the hang up.

As an example, the bluez code I linked to closes the channel and opens a new one upon disconnection. Be careful to also take care of the reference counting issues on your code.

liberforce
  • 11,189
  • 37
  • 48
  • Hi @liberforce, thank you for your help! I did try to post a stack trace from GDB in comments but the comment kept coming out wrapped instead of formatted as code. The upshot, as you observed, is that the server is stuck in __poll_nocancel () inside the main loop. Since I intend to run the client multiple times while the server stays running, I will try opening the FIFO as read/write on the server to see if it fixes the problem. Thank you! I'll mark this as correct answer once I try your suggestion. – bp9 May 02 '18 at 22:24
  • 1
    Problem solved by opening FIFO as O_RDWR on server. Selected as correct answer. Thank you! – bp9 May 02 '18 at 22:28