We're running a production system on Crystal/Kemal. The calling service sees quite often a Connection refused error. I was wondering how can I see more insights/metrics into a running instance of HTTP::Server/Kemal. I'm referring to the number of fibers running/waiting (out of the maximum number allowed), how large is the backlog of connections, how many have been refused and so on.
Asked
Active
Viewed 118 times
2
-
Are you using SSL server? – rogerdpack Aug 06 '20 at 14:20
1 Answers
1
Built-in tools: crystal tool -h
context show context for given location
expand show macro expansion for given location
format format project, directories and/or files
hierarchy show type hierarchy
implementations show implementations for given call in location
types show type of main variables
Common tools:
lsof +p $(pidof <process_name>)
— display connections/socket for process.ss -ier
— display internal socket stats.strace -p $(pidof <process_name>) -s 300 -yyfq
— useful tool for process introspection.tcpdump & wireshark
— dump and explore network packetsngrep
— like grep but for network packets.LLDB
— native debugger for LLVM-based app (tutorial)CodeLLDB
— Native VSCode debugger based on LLDB.
And don't forget crystal build ./app.cr --debug

Sergey Fedorov
- 3,696
- 2
- 17
- 21
-
Thank you! These are mostly "generic" tools, which are really valuable! But I was looking for something more specific, similar to the metrics Puma provides for Ruby, for example. – linkyndy Aug 06 '20 at 07:54
-
Could you provide a list of required metrics? But... I may be wrong, but I think you just don't know how and where to find the problem and hoping just to see any anomaly. Can you provide more details for the steps that caused the connection refuse? If you can show the source — would be perfect. – Sergey Fedorov Aug 06 '20 at 11:11
-
The only thing I'd add is profiling (first section here: https://crystal-lang.org/reference/guides/performance.html) in case it's cpu bound...you can see backlog for a particular port ex: https://www.quora.com/How-can-I-check-TCP-backlog-queue-for-a-specific-process-on-Linux – rogerdpack Aug 06 '20 at 14:24
-
I am referring to something similar to https://github.com/harmjanblok/puma-metrics. Regarding the exact metrics, I've mentioned them in my original post: the number of fibers running/waiting (out of the maximum number allowed), how large is the backlog of connections, how many have been refused. Regarding the connection refused, we have a service that tries to connect to this service and the connection is refused...the source is just an HTTP call. – linkyndy Aug 10 '20 at 09:19
-
1I have more that 4 years about 2-5 services (http + tcp & messagepack) under load (70-150 rps) and some others to data processing, scraping and communication between hosts. As I remember I have never used general metrics because my load profile is different. At first I tried to solve troubles with stuck sockets and close_wait (strace, lsof/ss), then with null bytes that were randomly found in the data (wireshark) and after that tried to increase the overall efficiency to reduce the cost (main logic rewrited with fibers). But also I have near Rails app with common metrics and popular gems. – Sergey Fedorov Aug 10 '20 at 19:12
-
2From my point of view for debug the Crystal app you almost always need to know state of environment but for Ruby it not necessary if CPU, Memory and HDD space enough. Maybe that's why I can't recommend a shards for collecting metrics. That is, I want to say that Ruby it's thing in itself but Crystal is native element of system and can be examined with system tools. In any case, if you have time, tell us later how you solved the problem? – Sergey Fedorov Aug 10 '20 at 19:12
-
TIL `ngrep` it's like grep but for network: https://github.com/jpr5/ngrep/blob/master/EXAMPLES.md – Sergey Fedorov Aug 23 '20 at 02:30