1

I have a Spark cluster install on an Ubuntu server with 3 worker nodes(using docker compose). enter image description here

You can see all cluster information. Also it's accessible(web ui) from my pc as shown below: enter image description here

Now I want to add a job from java driver, so it would be something like this:

SparkSession spark = SparkSession.builder()
            .master("spark://nodemaster:7077")
            .appName("MongoSparkConnectorIntro")
            .config("spark.network.timeout",2000000)
            .config("spark.driver.port","32772")
            .config("spark.driver.host","172.31.64.69")
            .config("spark.driver.bindAddress","172.18.0.2")
            .config("spark.mongodb.input.uri", "mongodb://")
            .config("spark.mongodb.output.uri", "mongodb://")
            .getOrCreate();

org.apache.spark.SparkContext: Running Spark version 2.4.1
org.apache.spark.SparkContext: Submitted application: MongoSparkConnectorIntro
org.apache.spark.SecurityManager: Changing view acls to: $USER
org.apache.spark.SecurityManager: Changing modify acls to: $USER
org.apache.spark.SecurityManager: Changing view acls groups to:
org.apache.spark.SecurityManager: Changing modify acls groups to:
org.apache.spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set($USER); groups with view permissions: Set(); users  with modify permissions: Set($USER); groups with modify permissions: Set()
org.apache.spark.util.Utils: Service 'sparkDriver' could not bind on a random free port. You may check whether configuring an appropriate binding address.

I'm using Intellij IDE on windows 10. I also added my IP to /etc/hosts

Any helps?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Hamix
  • 1,323
  • 7
  • 18
  • 1
    Not clear why you think you need `spark.driver` configurations. – OneCricketeer Dec 30 '19 at 12:45
  • @cricket_007 because spark runs on another server. I'm not sure after some research I ended up with this. – Hamix Dec 30 '19 at 12:49
  • 1
    The master runs on another server. The driver picks ports (and hosts) at random from available slaves. Also, I did something very similar with Mesos and found a good diagram to explain all the ports. https://stackoverflow.com/a/56486271/2308683 – OneCricketeer Dec 30 '19 at 12:51
  • 1
    I believe you'd have better luck with Spark on Kubernetes rather than Docker Compose because the network configurations are already taken care of there – OneCricketeer Dec 30 '19 at 12:52
  • 1
    Also, why run 3 small workers on one machine instead of letting one machine be one large worker? Running it like that wont achieve true parallelism because the same CPU cores and memory are all being shared by each container. Plus you've added an extra network hop – OneCricketeer Dec 30 '19 at 12:58
  • @cricket_007 You mean the performance wont be better than single node? Cause i read some articles about this, if its true whats the point of virtualization? Two vm using same cpu and by that the single node with larger resource will do the same – Hamix Dec 30 '19 at 14:53
  • Actually we are looking for a solution for our big data. organization using vmware for infrastructure. Im the lead developer and i choose this way after some research, so you are telling me to install a single node spark? I also have kube orch installed. That kube cluster is using for our microservice and i dont want to put extra load on it. – Hamix Dec 30 '19 at 15:01
  • 1
    VMs have isolated resources of memory and CPU, but still share physical hardware on one machine right? You still would have to run VMs on separated physical hosts to yield true parallelism. Plus, Spark spins up executors of a configurable amount of memory or cpu cores, so it should already be able to scale up to use all available host memory. You could setup a separate k8s cluster for Spark, if you're already familiar with that – OneCricketeer Dec 30 '19 at 18:38
  • @cricket_007 thanks for your time. Ill go with k8s, it was hard to configure network in docker, k8s is well documented. – Hamix Dec 30 '19 at 19:15

0 Answers0