12

I'm trying to setup some new hosts in munin for monitoring. For some reason it ain't happening!

Here's what I've tried so far.

On the munin server, which is already monitoring several other hosts, I've added the host I want in /etc/munin/munin.conf

[db1]
    address   10.10.10.25 # <- obscured the real IP address 
    use_node_name yes

And on the db1 host I have this set in /etc/munin/munin-node.conf

host_name  db1.example.com
allow ^127\.0\.0\.1$
allow ^10\.10\.10\.26$
allow ^::1$
port 4949

And I made sure to restart the services on both machines.

From the monitoring host I can telnet to the new server I want to monitor on the munin port:

[root@monitor3:~] #telnet db1.example.com 4949
Trying 10.10.10.26...
Connected to db1.example.com.
Escape character is '^]'.
# munin node at db1.example.com

Wait a few minutes.. and nothing! The new server won't appear in the munin dashboard on the munin monitoring host.

In the /var/log/munin/munin-update.log log on the db1 host (the one I'm trying to monitor) I find this:

2015/11/30 03:20:02 [INFO] starting work in 14199 for db1/10.10.10.26:4949.

2015/11/30 03:20:02 [FATAL] Socket read from db1 failed.  Terminating process. at /usr/share/perl5/vendor_perl/Munin/Master/UpdateWorker.pm line 254.

2015/11/30 03:20:02 [ERROR] Munin::Master::UpdateWorker<db1;db1> died with '[FATAL] Socket read from db1 failed.  Terminating process. at /usr/share/perl5/vendor_perl/Munin/Master/UpdateWorker.pm line 254.

What could be going on here? And how can I solve this ?

bluethundr
  • 1,005
  • 17
  • 68
  • 141
  • Check if port is available? – Somnath Muluk Feb 15 '16 at 10:17
  • What about the node's logs? Do they say anything about it? – muru Feb 15 '16 at 23:45
  • `10.10.10.25 != 52.3.28.48` – john Smith Feb 16 '16 at 14:38
  • john Smith, you caught me attempting to obfuscate the IPs. I just corrected the post so that it makes logical sense. Somnath Muluk - the ports are available on both hosts: monitor3: [root@monitor3:~] #lsof -i :4949 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME munin-nod 31800 root 5u IPv6 31820297 0t0 TCP *:munin (LISTEN) db1: [root@db1:~] #lsof -i :4949 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME munin-nod 14164 root 5u IPv6 26604748 0t0 TCP *:munin (LISTEN) muru the log I posted is from the db1 host that I am trying to monitor. – bluethundr Feb 16 '16 at 18:22
  • @bluethundr that is *very* surprising. The log is what I would expect to see on a master (`monitor3`, in this case). Note how it says "starting work ... for node/ip:port". Indeed, `munin-update.log` would be on the master, not the node. – muru Feb 16 '16 at 18:32

1 Answers1

2

Since you have already verified that your network connection is ok, as a first step of investigation, I would surely simplify the munin-node.conf. Currently you have:

host_name  db1.example.com
allow ^127\.0\.0\.1$
allow ^10\.10\.10\.26$
allow ^::1$
port 4949

From these I would remove:

  • host_name (it is probably redundant.)
  • The IPv6 loopback address. (I don't think you need it, but you can add it back later if you do need it)
  • The IPv4 loopback address. (same as above)

If it still not working, you could completely outrule any issue with the allow config by replacing the direct IPs with:

cidr_allow 10.10.10.0/24

This would allow connection from a full range of IPs in case your db1 host appears to be connecting from a different IP.

Gergely Bacso
  • 14,243
  • 2
  • 44
  • 64
  • Hi, ok so I tried everything you mention except for cidr_allow. Since i know what IP my munin server is coming from. My config on db1 looks like this: [root@db1:/etc/munin] #egrep -v "^$|^#" munin-node.conf log_level 4 log_file /var/log/munin-node/munin-node.log pid_file /var/run/munin/munin-node.pid background 1 setsid 1 user root group root ignore_file [\#~]$ ignore_file DEADJOE$ ignore_file \.bak$ ignore_file %$ ignore_file \.dpkg-(tmp|new|old|dist)$ ignore_file \.rpm(save|new)$ ignore_file \.pod$ allow ^54\.174\.234\.136$ host * port 4949 And I restarted munin on both server and client – bluethundr Feb 16 '16 at 21:25
  • Ok. A few things then: I would still try to use `cidr_allow`, just for debugging purposes. The `allow` setting relies on regexp. So there might be dragons. Also what is your munin version? And finally: you forgot to anonymize your IP in the previous comment. – Gergely Bacso Feb 16 '16 at 22:20
  • OK, thanks. I did try cidr_allow in the munin-node conf on db1. I tried first with the IP range of the munin server and then again with just cidr_allow 0.0.0.0/24. Tho I am not sure if that's allowed: – bluethundr Feb 17 '16 at 04:38
  • This is my munin-node conf on db1 on my last attempt: `[root@db1:/etc/munin] #egrep -v "^$|^#" munin-node.conf` ` log_level 4` ` log_file /var/log/munin-node/munin-node.log` `pid_file /var/run/munin/munin-node.pid` ` background 1` ` setsid 1` `user root` `group root` `ignore_file [\#~]$` ` ignore_file DEADJOE$` ` ignore_file \.bak$` `ignore_file %$` ` ignore_file \.dpkg-(tmp|new|old|dist)$` `ignore_file \.rpm(save|new)$` `ignore_file \.pod$` `allow ^10\.10\.10\.26$` `cidr_allow 0.0.0.0/24` `host *` `port 4949` – bluethundr Feb 17 '16 at 04:41
  • I reinstalled it on my machine, but I could not reproduce your error. So last guess: in your `munin.conf` you are referring to your host with a simple hostname (`db1`), but it identifies itself with FQDN (`db1.example.com`). That is something munin can be sensitive about. Could you change the `munin.conf` to use the FQDN as well? – Gergely Bacso Feb 17 '16 at 07:55
  • I tried changing the hostname in munin.conf on the server to the host's FQDN. However that didn't seem to have any effect. I think at this point the problem is with the server. I'm still seeing these lines in the munin-update log that I have in the OP: – bluethundr Feb 17 '16 at 13:10
  • `2016/02/17 03:20:02 [INFO] starting work in 22254 for db1/10.10.10.25:4949. 2016/02/17 03:20:02 [FATAL] Socket read from db1 failed. Terminating process. at /usr/share/perl5/vendor_perl/Munin/Master/UpdateWorker.pm line 254. 2016/02/17 03:20:02 [ERROR] Munin::Master::UpdateWorker died with '[FATAL] Socket read from db1 failed. Terminating process. at /usr/share/perl5/vendor_perl/Munin/Master/UpdateWorker.pm line 254.` I think the answer must relate to that error. I'm just usure how to address it. – bluethundr Feb 17 '16 at 13:11
  • `Socket read from db1 failed` message suggest that your change to FQDN was not taken into account. It should read `Socket read from db1.example.com` if the change was properly applied. – Gergely Bacso Feb 17 '16 at 13:15
  • I think I was looking up too high in the logs on that last post. Next thing I noticed after the name change to the FQDN was this happening in the logs: http://pastebin.ca/3375467 I didn't see any errors in that output. But I still am not seeing the node turn up in munin. – bluethundr Feb 17 '16 at 13:49
  • Based on the logmessage you posted you do have proper connection to the node-server now. That is a good sign. Plugins are reporting warnings on some missing fields. If you are sure that you do not have the graphs prepared (check `/var/cache/munin/www/index.html` to be sure) then check munin-html.log please. – Gergely Bacso Feb 17 '16 at 15:34
  • Sorry guys. I got really tired of dealing with this issue. It seemed to me that the issue was on the server end, and not the client. So I tried stopping the problematic munin server. Spun up a new one on AWS. Installed munin again, and voila! The problem clients started showing up in the munin dashboard. Lame In know. But hey, it works! ;) Sorry guys. But the bounty stays with yours truly. I do appreciate your thought and input however. Not trying to be an asshole. But I solved the problem. – bluethundr Feb 18 '16 at 05:32
  • I bumped into an e-mail of yours sent in December, so it is fully understandable. :) I am still wondering what was the issue, but you got it working, that is what matters the most. – Gergely Bacso Feb 18 '16 at 09:44
  • Cool thanks Gergely. I appreciate you understanding. I am having a couple of other sticking points with munin that I may post about on stack overflow. Haven't gotten as much help from the munin list as I'd lilke. I guess maybe it's not that trafficked at this point? – bluethundr Feb 18 '16 at 15:51
  • Looks like its fame is slowly fading: https://www.google.com/trends/explore#q=munin ... – Gergely Bacso Feb 18 '16 at 15:55
  • yeah man, that's unfortunate. Munin is one of my favorite old standby's for RRD graphing. I'll keep using it despite it's lack of popularity! – bluethundr Feb 18 '16 at 17:14