6

I did not configure any timeout value but used default settings. Where to configure 3600 seconds timeout? How to solve it?

Error message:

18/01/10 13:51:44 WARN Executor: Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [3600 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
    at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
    at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:738)
    at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:767)
    at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:767)
    at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:767)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1948)
    at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:767)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [3600 seconds]
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    ... 14 more
Shaido
  • 27,497
  • 23
  • 70
  • 73
John
  • 413
  • 1
  • 5
  • 13

2 Answers2

5

In the error message it says:

This timeout is controlled by spark.executor.heartbeatInterval

Hence, the first thing you try is increasing this value. It can be done in multiple ways, for example increasing the value to 10000 seconds:

  • When using spark-submit simply add the flag:

    --conf spark.executor.heartbeatInterval=10000s
    
  • You can add a line in spark-defaults.conf:

    spark.executor.heartbeatInterval 10000s
    
  • When creating a new SparkSession in your program, add a config parameter (Scala):

    val spark = SparkSession.builder
      .config("spark.executor.heartbeatInterval", "10000s")
      .getOrCreate()
    

If this does not help, it could be a good idea to try increasing the value of spark.network.timeout as well. It is also a common source for problem related to these types of timeouts.

Shaido
  • 27,497
  • 23
  • 70
  • 73
  • 1
    This issue did not happen very time. the default value of spark.executor.heartbeatInterval is 10s, but why it take 3600s. – John Jan 12 '18 at 16:37
  • @John: Since it's not the default value it must be configured somewhere, the three places I gave in the answer are a good start to start searching if you want to find it. The exception will not be thrown everytime, it depends on the computational loads on the executors. Did you manage to solve it (so the exception is never thrown) by changing the timeouts? – Shaido Jan 12 '18 at 16:43
  • 1
    This answer does seem to be correct. spark.executor.heartbeatInterval is the interval when executor sends a heartbeat to the driver. The driver would wait till spark.network.timeout to receive a heartbeat. Making the spark.executor.heartbeatInterval to 10000s (larger than spark.network.timeout) does not make sense. hearbeat interval should be significantly lower than network timeout. – Shirish Kumar Jan 21 '21 at 17:59
0
val spark = SparkSession.builder().appName("SQL_DataFrame")
  .master("local")
  .config("spark.network.timeout", "600s")
  .config("spark.executor.heartbeatInterval", "10000s")
  .getOrCreate()

Tested. It solved the problem.

Charlie 木匠
  • 2,234
  • 19
  • 19
  • 2
    Really strange, because 'heartbeatInterval' should be significantly less than spark.network.timeout (from spark doc) – Artem Rybin Mar 19 '20 at 15:09
  • Yes @Artem, you are right. Always spark.executor.heartbeatInterval value will be less than spark.network.timeout value. – Ranga Reddy Dec 21 '20 at 03:49