0

I’m getting a connection refused from my datadog-agent that is trying to collect JMX (via RMI) metrics from an in-house application that exists in its own docker container. However, jconsole is able to collect the metrics from the application that exists within its own docker container. The datadog-agent exists within a container of its own. Both containers exist within the same network on the same host. Any ideas? I have looked at the other stack overflow questions.

  • The IP address 0.0.0.0 and also specific host address have been tried within the custom jmx.yaml file /etc/dd-agent/conf.d/jmx.yaml

Docker Container 0:
* Runs the my_streams_app that outputs kafka streams metrics
* Executed via:

`docker run -d --name my_streams_app  
      -v /var/run/docker.sock:/var/run/docker.sock:ro  
      -v /proc/:/host/proc/:ro  
      -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro  
      -e API_KEY=someapikeyhere  
      -e SD_JMX_ENABLE=yes -p 9998:9998 --network=my_streams_default quay.io/temp/my_streams`
  • jconsole is able to pick up the metrics emmitted.

Docker Container 1:
* Runs datadog-agent within container * Datadog-agent uses JMX default (RMI) to fetch the metrics from my_streams_app that exists in container 0, above.
* both containers run on the same network within the same host (my laptop MAC OSX)
* able to netcat from within datadog-agent in docker container to the my_streams_app ip and port in the other container. Using 0.0.0.0 and 9998, can also use specific IP addresses
* command to run the datadog agent from within a container

docker run -v /var/run/docker.sock:/var/run/docker.sock:ro  -v /proc/:/host/proc/:ro -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro -e LOG_LEVEL=DEBUG -e SD_BACKEND=docker --network=mystreams_default 4b1488e74733
  • jmx configuration for collecting metrics by datadog jmx from within the container:

    instances:

    • host: 0.0.0.0 port: 9998 tags: newTag: my_streams jmx_url: "service:jmx:rmi:///jndi/rmi://0.0.0.0:9998/jmxrmi" name: jmx_instance

    docker_images: - my_streams_app

    init_config: is_jmx: true conf: - include: domain: '"kafka.streams"' bean: '"kafka.streams":type="stream-metrics",client-id=“my_test-1-StreamThread-1"' attribute: commit-calls-rate: metric_type: gauge commit-time-avg: metric_type: gauge commit-time-max: metric_type: gauge poll-calls-rate: metric_type: gauge

JConsole:
* collects metrics from my_streams_app within the docker container 0, above via:

jconsole 0.0.0.0:9998  

Error output:

2017-07-05 20:48:20,236 | ERROR | App | Cannot connect to instance service:jmx:rmi:///jndi/rmi://0.0.0.0:9998/jmxrmi. java.io.IOException: 

Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 0.0.0.0; nested exception is: 
    java.net.ConnectException: Connection refused (Connection refused)]
java.io.IOException: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 0.0.0.0; nested exception is: 
    java.net.ConnectException: Connection refused (Connection refused)]
    at org.datadog.jmxfetch.Connection.connectWithTimeout(Connection.java:117)
    at org.datadog.jmxfetch.Connection.createConnection(Connection.java:61)
    at org.datadog.jmxfetch.RemoteConnection.<init>(RemoteConnection.java:56)
    at org.datadog.jmxfetch.ConnectionFactory.createConnection(ConnectionFactory.java:29)
    at org.datadog.jmxfetch.Instance.getConnection(Instance.java:162)
    at org.datadog.jmxfetch.Instance.init(Instance.java:173)
    at org.datadog.jmxfetch.App.init(App.java:511)
    at org.datadog.jmxfetch.App.main(App.java:115)
Caused by: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 0.0.0.0; nested exception is: 
    java.net.ConnectException: Connection refused (Connection refused)]
    at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:370)
    at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:268)
    at org.datadog.jmxfetch.Connection$1.run(Connection.java:86)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 0.0.0.0; nested exception is: 
    java.net.ConnectException: Connection refused (Connection refused)]
    at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:142)
    at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:204)
    at javax.naming.InitialContext.lookup(InitialContext.java:415)
    at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1928)
    at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1895)
    at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:287)
    ... 7 more
Caused by: java.rmi.ConnectException: Connection refused to host: 0.0.0.0; nested exception is: 
    java.net.ConnectException: Connection refused (Connection refused)
    at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
    at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
    at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
    at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:341)
    at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
    at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:138)
    ... 12 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:580)
    at java.net.Socket.connect(Socket.java:529)
    at java.net.Socket.<init>(Socket.java:429)
    at java.net.Socket.<init>(Socket.java:209)
    at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
    at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:147)
    at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)

rmiregistry has been started as per Failed to retrieve RMIServer stub

blah
  • 1
  • 2
  • You are missing a lot of relevant information here. What is the Datadog configuration for connecting to JMX? How are the containers started (`docker` commands)? How is jconsole connecting (IP, credentials, port, etc)? What versions of things (Docker, host OS, etc)? Please amend your question to add this information so we can better reproduce the issue. – Andy Shinn Jul 05 '17 at 19:01

1 Answers1

0

The solution:

  • docker container 0

    • running the application that is outputting the metrics
    • create a bash script within the application that is outputting the metrics.
    • within the script use the set the value of the docker container $HOSTNAME environment variable to the jmxremote.host and the rmi.server.hostname.

    #!/bin/sh
    java -Djava.util.logging.config.file=logging.properties -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.rmi.port=9998 -Dcom.sun.management.jmxremote.port=9998 -Djava.rmi.server.hostname=$HOSTNAME -Dcom.sun.management.jmxremote.host=$HOSTNAME -Dcom.sun.management.jmxremote.local.only=false -jar /app/my-streams.jar

    • remember to set chmod +x
    • set the dockerfile CMD to run the script above like so:
      CMD["./"]
  • docker container 1

    • the container running the datadog agent
    • configure the jmx.yaml file as mentioned above in the question. just set the host to the application name
  • way more stuff was done that is available from from stack overflow posts. but the above fixes the metrics finding error from datadog-agent.


Here is how to run each component:

docker container 0
* my-streams
* spin up dependent services in tab
** mvn clean package docker:build
** docker-compose up

  • another tab spin up my-streams-app
    ** docker kill my-streams-app
    ** docker rm my-streams-app
    ** docker run -d --name my-streams-app -p 9998:9998 -- network=mystreams_default quay.io/myimage/my-streams

docker container 1
* docker build -t dd-agent-my-streams .
* docker run -v /var/run/docker.sock:/var/run/docker.sock:ro -v /proc/:/host/proc/:ro -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro -e LOG_LEVEL=DEBUG -e SD_BACKEND=docker --network=mystreams_default

ssh into docker container 1 to verify if metrics work
* docker ps // to find the name of the container to log into
* docker exec -it /bin/bash
root@904e6561cc97:/# service datadog-agent configcheck
root@904e6561cc97:/# service datadog-agent jmx list_everything
root@904e6561cc97:/# service datadog-agent jmx collect

blah
  • 1
  • 2