5

I have been trying to get a Spark Streaming job, running on a EC2 instance to report to VisualVM using JMX.

As of now I have the following config file:

spark/conf/metrics.properties:

*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource

worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource

driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource

executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

And I start the spark streaming job like this: (the -D bits I have added afterwards in the hopes of getting remote access to the ec2's jmx)

terminal:

spark/bin/spark-submit --class my.class.StarterApp --master local --deploy-mode client \
  project-1.0-SNAPSHOT.jar \
    -Dcom.sun.management.jmxremote \
    -Dcom.sun.management.jmxremote.port=54321 \
    -Dcom.sun.management.jmxremote.authenticate=false \
    -Dcom.sun.management.jmxremote.ssl=false
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Havnar
  • 2,558
  • 7
  • 33
  • 62
  • In what way doesn't it work, can you connect at all? – Klara Dec 18 '14 at 14:26
  • Adding the public IP to VisualVM is as far as I get. I can't get any further connection. I also have every connection in and out open as a security group of EC2 (I'm not sure if the port I supply with the -D params is even doing something) – Havnar Dec 18 '14 at 14:29
  • Just for others who'll end up here searching for the solution (look at the second answer, not the accepted one): http://stackoverflow.com/questions/19130877/jmx-connection-to-amazon-ec2-fails?rq=1 – Marko Bonaci May 24 '15 at 18:22

2 Answers2

3

There are two issues with the spark-submit command line:

  1. local - you must not run Spark Standalone with local master URL because there will be no threads to run your computations (jobs) and you've got two, i.e. one for a receiver and another for the driver. You should see the following WARN in the logs:

WARN StreamingContext: spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data, otherwise Spark jobs will not get resources to process the received data.

  1. -D options are not picked up by the JVM as they're given after the Spark Streaming application and effectively became its command-line arguments. Put them before project-1.0-SNAPSHOT.jar and start over (you have to fix the above issue first!)
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
1
spark-submit --conf "spark.driver.extraJavaOptions=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8090  -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"/path/example/src/main/python/pi.py 10000

Notes:the configurations format : --conf "params" . tested under spark 2.+

glee8e
  • 6,180
  • 4
  • 31
  • 51
Hao
  • 11
  • 1