25

I have an Elixir/Phoenix server app and the clients connect through the build in channels system via websockets. Now I want to detect when an user leaves a channel.

Sidenote: I'm using the javascript client library inside a Google Chrome Extension. For this I extracted the ES6 code from Phoenix, transpiled it to javascript, and tweaked it a little so it runs standalone.

Now when I just close the popup, the server immediately triggers the terminate/2 function with reason = {:shutdown, :closed}. There is no kind of close-callback involved on the extension side, so this is great!

But when the client simply looses network connection (I connected a second computer and just pulled out the network plug) then terminate/2 will not trigger.

Why and how do I fix this?

I played around with the timeoutoption of transport :websocket, Phoenix.Transports.WebSocket but this did not work out.

Update: With the new awesome Phoenix 1.2 Presence stuff, this should not be needed anymore.

Philip Claren
  • 2,705
  • 3
  • 24
  • 33
  • I just noticed that the server does not always recognize when the popup was closed. So I hope a solution for my question will solve this as well. – Philip Claren Nov 26 '15 at 08:50
  • Note that Presence would not work when the **last** user subscribed to the channel loses their connection. See https://stackoverflow.com/questions/53986369/presence-not-picking-up-user-leave-events/53998659 It might not be the right solution. – xji Jan 01 '19 at 20:30
  • Also see https://elixirforum.com/t/how-to-create-a-process-subscribed-to-a-channel-topic-to-watch-the-handle-diff-events/2869 Apparently even with Presence you'd still need an external process to monitor the channel, so the answer here is still very relevant. – xji Jan 02 '19 at 22:51

1 Answers1

50

The proper way to do this is to not trap exits in your channel, and instead have another process monitor you. When you go down, it can invoke a callback. Below is a snippet to get you started:

# lib/my_app.ex

children = [
  ...
  worker(ChannelWatcher, [:rooms])
]

# web/channels/room_channel.ex

def join("rooms:", <> id, params, socket) do
  uid = socket.assigns.user_id]
  :ok = ChannelWatcher.monitor(:rooms, self(), {__MODULE__, :leave, [id, uid]})

  {:ok, socket}
end

def leave(room_id, user_id) do
  # handle user leaving
end

# lib/my_app/channel_watcher.ex

defmodule ChannelWatcher do
  use GenServer

  ## Client API

  def monitor(server_name, pid, mfa) do
    GenServer.call(server_name, {:monitor, pid, mfa})
  end

  def demonitor(server_name, pid) do
    GenServer.call(server_name, {:demonitor, pid})
  end

  ## Server API

  def start_link(name) do
    GenServer.start_link(__MODULE__, [], name: name)
  end

  def init(_) do
    Process.flag(:trap_exit, true)
    {:ok, %{channels: HashDict.new()}}
  end

  def handle_call({:monitor, pid, mfa}, _from, state) do
    Process.link(pid)
    {:reply, :ok, put_channel(state, pid, mfa)}
  end

  def handle_call({:demonitor, pid}, _from, state) do
    case HashDict.fetch(state.channels, pid) do
      :error       -> {:reply, :ok, state}
      {:ok,  _mfa} ->
        Process.unlink(pid)
        {:reply, :ok, drop_channel(state, pid)}
    end
  end

  def handle_info({:EXIT, pid, _reason}, state) do
    case HashDict.fetch(state.channels, pid) do
      :error -> {:noreply, state}
      {:ok, {mod, func, args}} ->
        Task.start_link(fn -> apply(mod, func, args) end)
        {:noreply, drop_channel(state, pid)}
    end
  end

  defp drop_channel(state, pid) do
    %{state | channels: HashDict.delete(state.channels, pid)}
  end

  defp put_channel(state, pid, mfa) do
    %{state | channels: HashDict.put(state.channels, pid, mfa)}
  end
end

In newer versions of Elixir/Phoenix HashDict has changed name to Map. The correct example for newer codebases is:

# lib/my_app.ex

children = [
  ...
  worker(ChannelWatcher, [:rooms])
]

# web/channels/room_channel.ex

def join("rooms:", <> id, params, socket) do
  uid = socket.assigns.user_id]
  :ok = ChannelWatcher.monitor(:rooms, self(), {__MODULE__, :leave, [id, uid]})

  {:ok, socket}
end

def leave(room_id, user_id) do
  # handle user leaving
end

# lib/my_app/channel_watcher.ex

defmodule ChannelWatcher do
  use GenServer

  ## Client API

  def monitor(server_name, pid, mfa) do
    GenServer.call(server_name, {:monitor, pid, mfa})
  end

  def demonitor(server_name, pid) do
    GenServer.call(server_name, {:demonitor, pid})
  end

  ## Server API

  def start_link(name) do
    GenServer.start_link(__MODULE__, [], name: name)
  end

  def init(_) do
    Process.flag(:trap_exit, true)
    {:ok, %{channels: Map.new()}}
  end

  def handle_call({:monitor, pid, mfa}, _from, state) do
    Process.link(pid)
    {:reply, :ok, put_channel(state, pid, mfa)}
  end

  def handle_call({:demonitor, pid}, _from, state) do
    case Map.fetch(state.channels, pid) do
      :error       -> {:reply, :ok, state}
      {:ok,  _mfa} ->
        Process.unlink(pid)
        {:reply, :ok, drop_channel(state, pid)}
    end
  end

  def handle_info({:EXIT, pid, _reason}, state) do
    case Map.fetch(state.channels, pid) do
      :error -> {:noreply, state}
      {:ok, {mod, func, args}} ->
        Task.start_link(fn -> apply(mod, func, args) end)
        {:noreply, drop_channel(state, pid)}
    end
  end

  defp drop_channel(state, pid) do
    %{state | channels: Map.delete(state.channels, pid)}
  end

  defp put_channel(state, pid, mfa) do
    %{state | channels: Map.put(state.channels, pid, mfa)}
  end
end
Meeh
  • 2,538
  • 3
  • 18
  • 20
Chris McCord
  • 8,020
  • 1
  • 37
  • 26
  • 8
    I got it to work and I have two followup questions: 1. Why isn't this in the core (it's just too good)? 2. The time between the network disconnect and leave getting triggered is somewhat around 90secs. Is this in any way customizable? (I thought of setting the transport timeout to let's say 20 seconds and pinging the server every 10secs ... but of course additional resources getting burned) – Philip Claren Nov 26 '15 at 17:27
  • This solution is great. The only problem is that I do some DB actions in the `leave` function, and this issue https://stackoverflow.com/questions/38335635/ecto-2-0-sql-sandbox-error-on-tests happens in tests. – xji Feb 05 '19 at 23:09
  • Good that the issue is finally solved by Elixir v1.8.0 and DBConnection v2.0.4 https://twitter.com/plataformatec/status/1091300824251285504 – xji Feb 09 '19 at 22:31