0

I'm trying to use Spark 3 from a Spring boot 2.7.3 application. I'm working from a Docker compose environment on Windows 10 and Docker Desktop.

Here is my docker compose:

version: '3'
services:
  spark-master:
    image: bde2020/spark-master:3.3.0-hadoop3.3
    container_name: spark-master
    ports:
      - "8088:8080"
      - "7077:7077"
    environment:
      - INIT_DAEMON_STEP=setup_spark

  spark-worker-1:
    image: bde2020/spark-worker:3.3.0-hadoop3.3
    container_name: spark-worker-1
    depends_on:
      - spark-master
    ports:
      - "8081:8081"
    environment:
      - "SPARK_MASTER=spark://spark-master:7077"


  spark-worker-2:
    image: bde2020/spark-worker:3.3.0-hadoop3.3
    container_name: spark-worker-2
    depends_on:
      - spark-master
    ports:
      - "8082:8081"
    environment:
      - "SPARK_MASTER=spark://spark-master:7077"

My Spring server is on my local windows and is therefore not included in my compose. This is how I configure the connection with Spark:

@Configuration
public class SparkConfig {

    @Bean
    public JavaSparkContext sparkContext() {
        SparkConf sparkConf = new SparkConf()
                .setAppName("SparkSpringBootApplication")
                .setMaster("spark://localhost:7077");
        JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);

        return javaSparkContext;
}

Now when I launch my server I have these logs scrolling in a loop and preventing me from launching any processing.

2023-03-15 13:49:15.433  INFO 15904 --- [rainedScheduler] o.a.s.s.BlockManagerMaster               : Removal of executor 323 requested
2023-03-15 13:49:15.433  INFO 15904 --- [ckManagerMaster] o.a.s.s.BlockManagerMasterEndpoint       : Trying to remove executor 323 from BlockManagerMaster.
2023-03-15 13:49:15.433  INFO 15904 --- [rainedScheduler] seGrainedSchedulerBackend$DriverEndpoint : Asked to remove non-existent executor 323
2023-03-15 13:49:15.433  INFO 15904 --- [er-event-loop-5] o.a.s.s.c.StandaloneSchedulerBackend     : Granted executor ID app-20230315123815-0009/325 on hostPort 172.18.0.5:38035 with 8 core(s), 1024.0 MiB RAM
2023-03-15 13:49:15.455  INFO 15904 --- [er-event-loop-2] s.d.c.StandaloneAppClient$ClientEndpoint : Executor updated: app-20230315123815-0009/325 is now RUNNING
2023-03-15 13:49:19.539  INFO 15904 --- [er-event-loop-1] s.d.c.StandaloneAppClient$ClientEndpoint : Executor updated: app-20230315123815-0009/324 is now EXITED (Command exited with code 1)
2023-03-15 13:49:19.539  INFO 15904 --- [er-event-loop-1] o.a.s.s.c.StandaloneSchedulerBackend     : Executor app-20230315123815-0009/324 removed: Command exited with code 1
2023-03-15 13:49:19.540  INFO 15904 --- [er-event-loop-1] s.d.c.StandaloneAppClient$ClientEndpoint : Executor added: app-20230315123815-0009/326 on worker-20230315095819-172.18.0.4-33639 (172.18.0.4:33639) with 8 core(s)
2023-03-15 13:49:19.540  INFO 15904 --- [rainedScheduler] o.a.s.s.BlockManagerMaster               : Removal of executor 324 requested
2023-03-15 13:49:19.540  INFO 15904 --- [ckManagerMaster] o.a.s.s.BlockManagerMasterEndpoint       : Trying to remove executor 324 from BlockManagerMaster.
2023-03-15 13:49:19.540  INFO 15904 --- [rainedScheduler] seGrainedSchedulerBackend$DriverEndpoint : Asked to remove non-existent executor 324
2023-03-15 13:49:19.540  INFO 15904 --- [er-event-loop-1] o.a.s.s.c.StandaloneSchedulerBackend     : Granted executor ID app-20230315123815-0009/326 on hostPort 172.18.0.4:33639 with 8 core(s), 1024.0 MiB RAM
2023-03-15 13:49:19.561  INFO 15904 --- [er-event-loop-0] s.d.c.StandaloneAppClient$ClientEndpoint : Executor updated: app-20230315123815-0009/326 is now RUNNING
2023-03-15 13:49:19.802  INFO 15904 --- [er-event-loop-7] s.d.c.StandaloneAppClient$ClientEndpoint : Executor updated: app-20230315123815-0009/325 is now EXITED (Command exited with code 1)
2023-03-15 13:49:19.802  INFO 15904 --- [er-event-loop-7] o.a.s.s.c.StandaloneSchedulerBackend     : Executor app-20230315123815-0009/325 removed: Command exited with code 1
2023-03-15 13:49:19.802  INFO 15904 --- [rainedScheduler] o.a.s.s.BlockManagerMaster               : Removal of executor 325 requested
2023-03-15 13:49:19.802  INFO 15904 --- [er-event-loop-5] s.d.c.StandaloneAppClient$ClientEndpoint : Executor added: app-20230315123815-0009/327 on worker-20230315095819-172.18.0.5-38035 (172.18.0.5:38035) with 8 core(s)
2023-03-15 13:49:19.802  INFO 15904 --- [ckManagerMaster] o.a.s.s.BlockManagerMasterEndpoint       : Trying to remove executor 325 from BlockManagerMaster.
2023-03-15 13:49:19.802  INFO 15904 --- [rainedScheduler] seGrainedSchedulerBackend$DriverEndpoint : Asked to remove non-existent executor 325
2023-03-15 13:49:19.802  INFO 15904 --- [er-event-loop-5] o.a.s.s.c.StandaloneSchedulerBackend     : Granted executor ID app-20230315123815-0009/327 on hostPort 172.18.0.5:38035 with 8 core(s), 1024.0 MiB RAM
2023-03-15 13:49:19.823  INFO 15904 --- [er-event-loop-6] s.d.c.StandaloneAppClient$ClientEndpoint : Executor updated: app-20230315123815-0009/327 is now RUNNING

The only errors I find are at the level of the Spark Worker:

Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:424)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:413)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$9(CoarseGrainedExecutorBackend.scala:444)
    at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
    at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
    at scala.collection.immutable.Range.foreach(Range.scala:158)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:442)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
    ... 4 more
Caused by: java.io.IOException: Failed to connect to host.docker.internal/192.168.65.2:53999
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: host.docker.internal/192.168.65.2:53999
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)

Seems Spark can't find my Spring server? I don't understand where the error can come from. I specify that I was careful to use the same version of Spark / Hadoop in all my imports.

Any help would be welcome.

Thank you all and have a nice day

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
MrStan12
  • 55
  • 6
  • I think the worker is not able to bind to the driver, this may answer your question: https://stackoverflow.com/questions/45489248/running-spark-driver-program-in-docker-container-no-connection-back-from-execu – Abdennacer Lachiheb Mar 15 '23 at 14:29
  • I dont think SpringNoot is the problem, and it isn't really necessary to use it. Where are you running `spark-submit` from? If not inside a container, then `docker.host.internal` address will not resolve – OneCricketeer Mar 15 '23 at 21:13

0 Answers0