So I am experimenting with a situation where I want to stream huge files from a third-party URL, through my server, to the requesting client.
So far I have tried implementing this with Curb or Net::HTTP by adhering to the standard Rack practice of "eachable" response bodies, like so:
class StreamBody
...
def each
some_http_library.on_body do | body_chunk |
yield(body_chunk)
end
end
end
However I cannot make this system use less than, say, 40% CPU (on my MacBook Air). If I try to do the same with Goliath, using em-synchrony (like advised on the Goliath page) I can get the CPU usage down to about 25% CPU, however I cannot manage to flush the headers. My streaming download "hangs" in the requesting client and the headers show up once the entire response has been sent to the client, no matter what headers I supply.
Am I correct in thinking that this is one of those cases where Ruby just sucks marvelously and I have to turn to the go's and nodejs'es of the world instead?
By comparison, we currently use PHP streaming from CURL to the PHP output stream and that works with very little CPU overhead.
Or is there an upstream proxying solution that I could ask to handle my stuff? Problem is - I want to reliably call a Ruby function once the entire body has been sent to the socket, and things like nginx proxies will not do it for me.
UPDATE: I have tried to do a simple benchmark for HTTP clients and it looks like most of the CPU use are the HTTP client libs. There are benchmarks for Ruby HTTP clients, but they are based on the response receive times - whereas CPU usage is never mentioned. In my test I have performed an HTTP streamed download writing the result to /dev/null
, and got consistent 30-40% CPU usage, which about matches the CPU usage I have when streaming through any Rack handler.
UPDATE: It turns out that most Rack handlers (Unicorn etc) use a write() loop on the response body, which might enter a busy wait (with high CPU load) when the response cannot be written fast enough. This can be mitigated to a degree by using rack.hijack
and writing to the output socket using write_nonblock
an IO.select
(surprizsed the servers do not do that by themselves).
lambda do |socket|
begin
rack_response_body.each do | chunk |
begin
bytes_written = socket.write_nonblock(chunk)
# If we could write only partially, make sure we do a retry on the next
# iteration with the remaining part
if bytes_written < chunk.bytesize
chunk = chunk[bytes_written..-1]
raise Errno::EINTR
end
rescue IO::WaitWritable, Errno::EINTR # The output socket is saturated.
IO.select(nil, [socket]) # Then let's wait on the socket to be writable again
retry # and off we go...
rescue Errno::EPIPE # Happens when the client aborts the connection
return
end
end
ensure
socket.close rescue IOError
rack_response_body.close if rack_response_body.respond_to?(:close)
end
end