17

I am running RabbitMQ v3.3.5 with Erlang OTP 17.1 on Windows 2008 R2. My Dev and QA environments are stand-alone. My staging and production environments are clustered.

I am finding this one problem happening often where the RabbitMQ service is running, the RabbitMQ management console is seeing everything, but when I try running rabbitmqctl from the command line it fails with an error saying that the node is down (tried locally and on a remote server).

This problem is resolved if I restart the Windows service.

I see no error message in the RabbitMQ error log. The last message indicated that the node was up.

Below is an example output of the issue that I recently experienced on node 2 of our staging windows cluster:

PS C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.3.5\sbin> .\rabbitmqctl.bat status
Status of node rabbit@MYSERVER2 ...
Error: unable to connect to node rabbit@MYSERVER2: nodedown

DIAGNOSTICS
===========

attempted to contact: [rabbit@MYSERVER2]

rabbit@MYSERVER2:
  * connected to epmd (port 4369) on MYSERVER2
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on MYSERVER2
  * suggestion: start the node

current node details:
- node name: rabbitmqctl2199771@MYSERVER2
- home dir: C:\Users\RabbitMQ
- cookie hash: mn6OaTX9mS4DnZaiOzg8pA==

at this point I restart the RabbitMQ service and then try again

PS C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.3.5\sbin> .\rabbitmqctl.bat status
Status of node rabbit@MYSERVER2...
[{pid,3784},
 {running_applications,
     [{rabbitmq_management_agent,"RabbitMQ Management Agent","3.3.5"},
      {rabbit,"RabbitMQ","3.3.5"},
      {os_mon,"CPO  CXC 138 46","2.2.15"},
      {mnesia,"MNESIA  CXC 138 12","4.12.1"},
      {xmerl,"XML parser","1.3.7"},
      {sasl,"SASL  CXC 138 11","2.4"},
      {stdlib,"ERTS  CXC 138 10","2.1"},
      {kernel,"ERTS  CXC 138 10","3.0.1"}]},
 {os,{win32,nt}},
 {erlang_version,
     "Erlang/OTP 17 [erts-6.1] [64-bit] [smp:4:4] [async-threads:30]\n"},
 {memory,
     [{total,35960208},
      {connection_procs,2704},
      {queue_procs,5408},
      {plugins,111936},
      {other_proc,13695792},
      {mnesia,102296},
      {mgmt_db,0},
      {msg_index,21816},
      {other_ets,884704},
      {binary,25776},
      {code,16672826},
      {atom,602729},
      {other_system,3834221}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{amqp,5672,"0.0.0.0"}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,3435787059},
 {disk_free_limit,50000000},
 {disk_free,74911649792},
 {file_descriptors,
     [{total_limit,8092},
      {total_used,4},
      {sockets_limit,7280},
      {sockets_used,2}]},
 {processes,[{limit,1048576},{used,139}]},
 {run_queue,0},
 {uptime,5}]
...done.

Any idea as to what causes this and how to automatically detect the situation?

Is this specifically a problem with running RabbitMQ on Windows?

Matteo
  • 37,680
  • 11
  • 100
  • 115
Alf47
  • 563
  • 1
  • 5
  • 12
  • I have confirmed that the cookie hash in the error message matches the cookie hash of the last successful service restart in the log file and that that hash also matches the cookie hash of the last successful service restart on the other node. – Alf47 Aug 21 '14 at 18:31
  • Having the exact same problem. Looks like the discussion is continued on the mailing list https://groups.google.com/forum/#!topic/rabbitmq-users/Zn8unuF4bTM – skMed Aug 22 '14 at 15:58
  • Yes, I am going to continue to keep this up to date with the latest information as well. So far, the only further solid information I have is that I was able to confirm that when the issue is happening the epmd.exe process is not running on the server. I can see this in the Windows task manager. As soon as I restart the RabbitMQ service, the epmd.exe process spawns and everything is working correctly. – Alf47 Aug 22 '14 at 17:22
  • I get this issue, and resolved it by this method:https://stackoverflow.com/questions/38523236/rabbitmqctl-start-app-error-on-os-x-unable-to-connect-to-node-rabbitlocalhost/45955092#45955092 – aircraft Aug 30 '17 at 08:06

4 Answers4

10

Hostnames are case-insensitives when you are trying to resolve them. For example, LOCALHOST and localhost are the same host.

However, when Erlang constructs the name of a node (eg. rabbit@<hostname> in the case of RabbitMQ), this name is case-sensitive. So rabbit@LOCALHOST and rabbit@localhost are two different node names, even if they run on the same host.

Recently, we (the RabbitMQ team) found out that, on Windows, the node name constructed for RabbitMQ was inconsistent. Therefore, sometimes, RabbitMQ started as a Windows service could be named rabbit@MYHOST but rabbitmqctl would try to reach rabbit@myhost and fail.

Since RabbitMQ 3.6.0, the node name should be consistent.

  • On WIndows 10 Pro, the rabbitmq installer fails to properly configure RabbitMQ as a service. I had to run `rabbitmq-service remove` and then `rabbitmq-service install` for it to work properly. – Rosdi Kasim Jun 29 '16 at 08:56
  • Hi! Could you please bring this issue to the rabbitmq-users mailing-list? https://groups.google.com/forum/#!forum/rabbitmq-users – Jean-Sébastien Pédron Jul 01 '16 at 07:58
  • looks like the problem still persists. It is trying to reach the node `rabbit@` and my system's hostname is in small characters. Is there a solution to this problem? Or Is there a way I can set hostname in capitals, will this solve my problem? – phougatv Aug 17 '18 at 08:27
2

To anyone else getting this error, this was my fix. I installed Erlang, but overlooked the instructions on setting up the Environmental Variable.

I was reading the manual install page: https://www.rabbitmq.com/install-windows-manual.html and found the following:

Set ERLANG_HOME to where you actually put your Erlang installation, e.g. C:\Program Files\erlx.x.x (full path). The RabbitMQ batch files expect to execute %ERLANG_HOME%\bin\erl.exe.

Go to Start > Settings > Control Panel > System > Advanced > Environment Variables. Create the system environment variable ERLANG_HOME and set it to the full path of the directory which contains bin\erl.exe.

For some reason, the auto install assigned the wrong path name to the ERLANG_HOME variable - see image below. I simply added \bin on the end. enter image description here

NealWalters
  • 17,197
  • 42
  • 141
  • 251
  • 1
    Your quote states "The RabbitMQ batch files expect to execute `%ERLANG_HOME%\bin\erl.exe`" and therefore `ERLANG_HOME` should be set to the directory _containing_ `bin\erl.exe`. You have concluded `ERLANG_HOME` should be set to `C:\Program Files (x86)\erl7.3\bin`, but that would leave `[...]\bin\bin\erl.exe` after expansion. Is that correct? – Samuel Harmer Jul 10 '18 at 17:41
  • Sorry, don't remember all the issues. – NealWalters Jul 10 '18 at 18:35
  • I tried both `\bin\erl.exe` and `\bin` and both return path can't be found or something along these lines, so I guess this is not the solution any more or at least not for RabbitMQ 3.6 and erl 8.1 – et3rnal Jul 14 '20 at 05:06
1

I had a similar problem on my linux box and am posting the answer here, because rabbitmq on windows may handle things similarly.

My post and solution: rabbtimqadmin - Could not connect: [Errno -2] Name or service not known

The core issue was changing the servername after rabbitmq was configured. When installed, rabbitmq references the servers name, making it part of its configuration. I can see this being a similar issue on windows.

In short, you can change server's name back to the name it was when you first installed rabbitmq or you can add a rabbitmq-env.conf file, I'm not sure where it would go in windows, but the following gives details for linux: https://www.rabbitmq.com/man/rabbitmq-env.conf.5.man.html

Note that on linux the name of the server was CaSe SENiTivE! So you may or may not have a similar issue with windows.

Hope this helps and good luck!

Community
  • 1
  • 1
James Oravec
  • 19,579
  • 27
  • 94
  • 160
  • 1
    I found that the epmd process was halting and that was what was ultimately breaking things. When I restarted the service is turned that process back on and everything started working. So I ended up creating a service monitor that not only checks to ensure that the RabbitMQ service is running but also that the epmd process is running. If either of those fail it alerts me and restarts the RabbitMQ service. – Alf47 Apr 10 '15 at 14:08
0

If you are using linux try to change permission of /var/lib/rabbitmq/mnesia folder.

Tushar Saxena
  • 345
  • 4
  • 15