forkIO seems to block on haskell websocket server

Question

I'm running a haskell websocket server using Wai:

application :: MVar ServerState -> Wai.Application
application state = WaiWS.websocketsOr WS.defaultConnectionOptions wsApp staticApp
  where
    wsApp :: WS.ServerApp
    wsApp pendingConn = do
      conn <- WS.acceptRequest pendingConn 
      talk conn state

To allow a single client to send asynchronous messages, talk is defined as follows:

talk :: WS.Connection -> MVar ServerState -> IO ()
talk conn state = forever $ do
  msg <- WS.receiveMessage conn

  putStrLn "received message"

  successLock <- newEmptyMVar 
  tid <- timeoutAsync successLock $ processMessage c state msg

  putStrLn "forked thread"

  modifyMVar_ state $ \curState -> 
    return $ curState & threads %~ (M.insert mid tid) -- thread bookkeeping

  putStrLn "modified state"

  putMVar successLock ()
  putStrLn "unlocked success"

  where
    mid                 = serverMessageId msg
    timeoutAsync lock f = forkIO $ do
      timeout S.process_message_timeout onTimeout (onSuccess lock) f
    onSuccess lock      = do
      -- block until the first modifyMVar_ above finishes.
      takeMVar lock
      modifyMVar_ state $ \curState -> 
        return $ curState & threads %~ (M.delete mid) -- thread cleanup
    onTimeout = ...

Here's the thing: when I bombard this server with many messages (from a single client) that are CPU-heavy, the the main thread occasionally hangs at "forked thread".

This is surprising because all work on messages are (in theory) being done in separate threads, and so the main thread (forever) should never block.

What's going on here?

[EDIT]

A minimum verifiable example is pretty hard to provide in this case (the work is done in processMessage, but comprises a lot of moving parts, any of which might be the problem). Instead, I'm looking for high-level pointers to things I could investigate.

Here is data from an example run (send the server an expensive request, then a bunch of smaller less-expensive ones):

gc productivity 36%: http://puu.sh/nSxnj/d8bb5995ae.png
event log (using +RTS -ls and -eventlog): http://puu.sh/nSxDy/efe457bee2.eventlog
CPU usage ~300% (for 4 caps) -- made me think GC might be competing with OS resources; I decreased the num capabilities to n-1, and this seemed to improve responsiveness

Also, the app has the following properties, which I think are potential causes of the problem:

ratio of GC'd to live data is high; processMessage basically constructs a giant list which is aeson'd and sent back to the user, but not kept in state
many foreign calls are made (due to ZMQ, which iirc makes unsafe foreign calls) on a single request
ThreadScope tells me that lots of heapoverflows occur, causing GC requests

are you ever doing a `putMVar` or `takeMVar` on your `state` MVar outside of this code? — jberryman, Mar 16 '16 at 20:26
you may need to provide a self-contained runnable example if you want help debugging this — jberryman, Mar 16 '16 at 21:36
Where does the computation happen? I don't see it in your example. — sclv, Mar 17 '16 at 17:45
To get help, you'll need a minimum complete verifiable example: http://stackoverflow.com/help/mcve — sclv, Mar 21 '16 at 01:28
Unfortunately it's quite difficult to provide an mcve because there are tons of moving parts in `processMessage` and I don't really know which direction to look yet. I've provided data from a sample run that might indicate weird behavior I'm not seeing. — anand, Mar 24 '16 at 19:49
when you say "hangs" do you mean permanently like a deadlock? Or just pauses longer than you'd expect between "forked thread" and "modified state"? — sclv, Mar 25 '16 at 03:04

forkIO seems to block on haskell websocket server

0 Answers0