Spark job fails: storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file

Question

I have a Spark (1.4.1) application, running on Yarn, that fails with the following executor log entry:

16/07/21 23:09:08 ERROR executor.CoarseGrainedExecutorBackend: Driver 9.4.136.20:55995 disassociated! Shutting down.
16/07/21 23:09:08 ERROR storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /dfs1/hadoop/yarn/local/usercache/mitchus/appcache/application_1465987751317_1172/blockmgr-f367f43b-f4c8-4faf-a829-530da30fb040/1c/temp_shuffle_581adb36-1561-4db8-a556-c4ac0e6400ed
java.io.FileNotFoundException: /dfs1/hadoop/yarn/local/usercache/mitchus/appcache/application_1465987751317_1172/blockmgr-f367f43b-f4c8-4faf-a829-530da30fb040/1c/temp_shuffle_581adb36-1561-4db8-a556-c4ac0e6400ed (No such file or directory)
    at java.io.FileOutputStream.open0(Native Method)
    at java.io.FileOutputStream.open(FileOutputStream.java:270)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
    at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(BlockObjectWriter.scala:189)
    at org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:328)
    at org.apache.spark.util.collection.ExternalSorter.spill(ExternalSorter.scala:257)
    at org.apache.spark.util.collection.ExternalSorter.spill(ExternalSorter.scala:95)
    at org.apache.spark.util.collection.Spillable$class.maybeSpill(Spillable.scala:83)
    at org.apache.spark.util.collection.ExternalSorter.maybeSpill(ExternalSorter.scala:95)
    at org.apache.spark.util.collection.ExternalSorter.maybeSpillCollection(ExternalSorter.scala:240)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:220)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Any clues as to what might have gone wrong?

Mind upgrading to 1.6.2 (or soon 2.0)? There were some issues reported similar to your case and fixed in the recent releases. — Jacek Laskowski, Jul 23 '16 at 19:23
I got a similar message earlier today with Spark 2.0 under SparkR; restarting my session seemed to clear the error - probably won't help OP, but just sayin'. — russellpierce, Aug 14 '16 at 21:48
Did you by any chance set the master as 'local' in your spark context and then used spark submit in yarn mode? — seagull1089, Jun 05 '17 at 22:33
@seagull1089, Can you elaborate on where should can specify my spark context as non `local`? I am creating my SparkContext object as follows. `sc = SparkContext(appName = "Tracks")` — Ravi Chandra, Jun 06 '17 at 07:22
@RaviChandra: something like this: val conf = new SparkConf().setAppName("Application Name") conf.setMaster("local[*]") val sc = new SparkContext(conf) — ceteras, Jun 22 '17 at 06:57
Can it be related to https://stackoverflow.com/questions/25707629/why-does-spark-job-fail-with-too-many-open-files ? — ucsky, Mar 13 '18 at 19:30

score 2 · Answer 1 · answered Sep 06 '18 at 05:40

The reason caused by temp shuffle file is deleted. There are many reasons, for one which I met is because the other executor was killed by Yarn. After the executor killed, a SHUT_DOWN signal will be sent to other executors, then the ShutdownHookManager will delete all the temp files which have registered to ShutdownHookManager. That's why you see the error. So you maybe need to check whether there are any ShutdownHookManager called log.

Any idea how to resolve it ? – MsCurious Mar 18 '22 at 21:48 — MsCurious, Mar 18 '22 at 21:48

score 0 · Answer 2 · edited Apr 11 '17 at 11:57

0

You can try to improve spark.yarn.executor.memoryOverhead.

edited Apr 11 '17 at 11:57

Prasad Khode

6,602
11
44
59

answered Aug 15 '16 at 09:24

huron

762
1
5
22

I've already tried that, increased it to several GB, with no success for this particular error. – mitchus Aug 18 '16 at 12:13
I had a similar issue. I increased this setting to 4GB from 384MB, but I am getting the same error. – Ravi Chandra Jun 06 '17 at 07:16
2

Why this ```ERROR storage.DiskBlockObjectWriter``` may be related to a memory issu? – ucsky Mar 13 '18 at 18:33

Spark job fails: storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file

2 Answers2

Linked