How can I check that my Spark Cluster do work?

Question

I have installed Spark 2.3.0 on Ubuntu 18.04 with two nodes: a master one (ip: 172.16.10.20) and a slave one (ip: 172.16.10.30). I can check that this Spark cluster looks like up and running

jps -lm | grep spark
14165 org.apache.spark.deploy.master.Master --host 172.16.10.20 --port 7077 --webui-port 8080
13701 org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://172.16.10.20:7077

I give it a try with this simple R script (using the sparklyr package):

library(sparklyr)
library(dplyr)
# Set your SPARK_HOME path                                                                                                                                                       
Sys.setenv(SPARK_HOME="/home/master/spark/spark-2.3.0-bin-hadoop2.7/")
config <- spark_config()
# Optionally you can modify config parameters here                                                                                                                               

sc <- spark_connect(master = "spark://172.16.10.20:7077", spark_home = Sys.getenv("SPARK_HOME"), config = config)
# Some test code, copying data to Spark cluster                                                                                                                                  
iris_tbl <- copy_to(sc, iris)
src_tbls(sc)

spark_apply(iris_tbl, function(data) {
   return(head(data))
})

All commands are executed, fine and smooth (but a bit slow to my taste), and the spark log is kept in a temp file. When looking into the log file I see no mention of the slave node, which makes me wonder, whether this Spark is really running in a cluster mode.

How may I check that the master-slave relation is really working?

Open 172.16.10.20:8080, you will see all detected slaves. In Web UI you can see in job details, on which job which task was executed — T. Gawęda, Aug 31 '18 at 16:21

Chandan Ray · Answer 1 · 2018-08-31T17:16:42.697

0

In your case please check

172.16.10.20:8080 url and open executors tab to see the number executors running

edited Aug 31 '18 at 17:16

answered Aug 31 '18 at 16:38

Chandan Ray

2,031
1
10
15

Did you mean port 8080? – Xavier Prudent Aug 31 '18 at 16:52

score 0 · Answer 2 · answered Aug 31 '18 at 18:00

0

Here is URL

http://[driverHostname]:4040 by default

http://<master-ip>:8080(webui-port)

Additional info on a monitor and inspect Spark job executions

command based status check stack question

answered Aug 31 '18 at 18:00

devesh

618
6
26

How can I check that my Spark Cluster do work?

2 Answers2