34

Major edit: Since originally finding this issue I have whittled it down to the below. I think this is now a marginally more precise description of the problem. Comments on the OP may therefore not correlate entirely.

Edit lightly modified version posted in rails/puma projects: https://github.com/rails/rails/issues/21209, https://github.com/puma/puma/issues/758

Edit Now reproduced with OS X and Rainbows

Summary: When using Puma and running long-running connections I am consistently receiving errors related to ActiveRecord connections crossing threads. This manifests itself in message like message type 0x## arrived from server while idle and a locked (crashed) server.

The set up:

  • Ubuntu 15 / OSX Yosemite
  • PostgreSQL (9.4) / MySQL (mysqld 5.6.25-0ubuntu0.15.04.1)
  • Ruby - MRI 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux] / Rubinius rbx-2.5.8
  • Rails (4.2.3, 4.2.1)
  • Puma (2.12.2, 2.11)
  • pg (pg-0.18.2) / mysql2

Note, not all combinations of the above versions have been tried. First listed version is what I'm currently testing against.

  • rails new issue-test
  • Add a route get 'events' => 'streaming#events'
  • Add a controller streaming_controller.rb
  • Set up database stuff (pool: 2, but seen with different pool sizes)

Code:

class StreamingController < ApplicationController

  include ActionController::Live

  def events
    begin
      response.headers["Content-Type"] = "text/event-stream"
      sse = SSE.new(response.stream)
      sse.write( {:data => 'starting'} , {:event => :version_heartbeat})
      ActiveRecord::Base.connection_pool.release_connection
      while true do
        ActiveRecord::Base.connection_pool.with_connection do |conn|
          ActiveRecord::Base.connection.query_cache.clear
          logger.info 'START'
          conn.execute 'SELECT pg_sleep(3)'
          logger.info 'FINISH'
          sse.write( {:data => 'continuing'}, {:event => :version_heartbeat})
          sleep 0.5
         end
      end
    rescue IOError
    rescue ClientDisconnected
    ensure
      logger.info 'Ensuring event stream is closed'
      sse.close
    end
    render nothing: true
  end
end

Puma configuration:

workers 1
threads 2, 2
#...
bind "tcp://0.0.0.0:9292"

#...
activate_control_app

on_worker_boot do
  require "active_record"
  ActiveRecord::Base.connection.disconnect! rescue ActiveRecord::ConnectionNotEstablished
  ActiveRecord::Base.establish_connection(YAML.load_file("#{app_dir}/config/database.yml")[rails_env])
end
  • Run the server puma -e production -C path/to/puma/config/production.rb

Test script:

#!/bin/bash

timeout 30 curl -vS http://0.0.0.0/events &
timeout 5 curl -vS http://0.0.0.0/events &
timeout 30 curl -vS http://0.0.0.0/events

This reasonably consistently results in a complete lock of the application server (in PostgreSQL, see notes). The scary message comes from libpq:

message type 0x44 arrived from server while idle
message type 0x43 arrived from server while idle
message type 0x5a arrived from server while idle
message type 0x54 arrived from server while idle

In the 'real-world' I have quite a few extra elements and the issue presents itself at random. My research indicates that this message comes from libpq and is subtext for 'communication problem, possibly using connection in different threads'. Finally, while writing this up, I had the server lock up without a single message in any log.

So, the question(s):

  1. Is the pattern I'm following not legal in some way? What have I mis[sed|understood]?
  2. What is the 'standard' for working with database connections here that should avoid these problems?
  3. Can you see a way to reliably reproduce this?

or

  1. What is the underlying issue here and how can I solve it?

MySQL

If running MySQL, the message is a bit different, and the application recovers (though I'm not sure if it is then in some undefined state):

F, [2015-07-30T14:12:07.078215 #15606] FATAL -- : 
ActiveRecord::StatementInvalid (Mysql2::Error: This connection is in use by: #<Thread:0x007f563b2faa88@/home/dev/.rbenv/versions/2.2.2/lib/ruby/gems/2.2.0/gems/actionpack-4.2.3/lib/action_controller/metal/live.rb:269 sleep>: SELECT  `tasks`.* FROM `tasks`  ORDER BY `tasks`.`id` ASC LIMIT 1):
button
  • 666
  • 6
  • 14
  • 1
    From ActiveRecord docs using with_connection relies on the block passed as argument completing. Are you sure it completes? How about handling the connection with Base.connection or checkout? – Grasshopper Jul 23 '15 at 11:21
  • 1
    @Grasshopper - thanks! I am worried that this will hold the connection open for the lifetime of the request (hours) thus eating my connection pool pretty quickly? I suppose the way that it might not complete is if sse.write blocks for some reason and the thread just sits there, i.e. if the connection has gone and it doesn't return for some reason? (that said, I'm not sure that fully explains the thread-problem based messages from libpq). (will experiment with a few things in that direction) – button Jul 23 '15 at 12:11
  • 1
    Indeed the problem you describe can happen in scenarios where the connections are not being released. Can you try to remove the call to sse.write from the with_connection block? – Grasshopper Jul 23 '15 at 13:47
  • 1
    Another question is: do you really need to initiate the connection, because from the AR docs seems like User.find would do it for you. – Grasshopper Jul 23 '15 at 13:55
  • 1
    @Grasshopper. Made another update to Q. My rationale behind initiating the connection explicitly was that looking at the `with_connection` code, it wraps the block in an `ensure` [close connection], which I'd hoped would ensure the connection was closed(!). I need to release the connection before entering the endless Redis call-loop to to avoid exhausting the connection pool. Presumably 'just accessing' will have similar effects but I'd want to ensure `release_connection` on each message. Will check it out. – button Jul 23 '15 at 14:37
  • The first output look like the ruby driver is trying to init a prepared statement twice which will throw a duplicate_prepared_statement. Again I can see as parallel asynchronous calls may cause this kind of behavior. What I find strange is that, tipically ORM frameworks handle the connection for you, so why not let AR do the job? What happens if you remove the call to with_connection at all. – Grasshopper Jul 23 '15 at 19:22
  • 1
    Unfortunately AR cannot do the job on its own without some hints (owing to the long running connection). I commented the `with_connection` bit and was able to get the same problem. removing `release_connection` just meant that the pool limit was hit as soon as I have as many browsers open as the pool-size (which is not ideal!). I can't help but feel that this could well be a bug/incompatibility somewhere between Puma, Rails, AR and Ruby. Something like a problem with the way threads are handled or similar. – button Jul 25 '15 at 15:26
  • 2
    Why don't you move all this things into the background, and shove the `_user_id` into a queue(which could be redis or rabbitmq) that something in the background would consume from, I am not sure you want to open and close connections based on web request else you would run out of sockets pretty fast – bjhaid Jul 26 '15 at 12:31
  • 1
    Hi @bjhaid thanks for the input. I added an update to bottom of the post. What I had thought (hoped) would happen is that the connection pool would, in effect handle this. I can see the problem with running out of DB connections, but that wasn't a concern at the moment. The way I can see the queue operating is to have a list of current subscriptions there; something like `queue-of-updates` + `list-of-subscriptions` => `worker-processing-data` => `data-to-subscribers` => `sse.write(whatever-is-received)` – button Jul 26 '15 at 15:05
  • Had an issue in python multithreading with while loops. Went away when I changed to a method where when my function ended, it was automatically called after a set time. Just a thought. – Ninga Dec 08 '15 at 03:09
  • in your boot, try connection_pool.disconnect rather than just connection.disconnect – chris Apr 22 '17 at 06:38

1 Answers1

1

Warning: read 'answer' as 'seems to make a difference'


I don't see the issue happen if I change the controller block to look like:

begin
  #...
  while true do
    t = Thread.new do #<<<<<<<<<<<<<<<<<
        ActiveRecord::Base.connection_pool.with_connection do |conn|
            #...
        end
     end
     t.join #<<<<<<<<<<<<<<<<<
  end
  #...
rescue IOError
#...

But I don't know whether this has actually solved the problem or just made it extremely unlikely. Nor can I really fathom why this would make a difference.

Posting this as a solution in case it helps, but still digging on the issue.

button
  • 666
  • 6
  • 14
  • So I have a long running process too and this is how I use it too. my question is, why do you start a thread for each iteration? doesn't it make sense to start the thread and check out the connection for each client out side the loop? – Carlos Roque Oct 20 '17 at 16:38