0

I am getting the following error while trying to save the RDD to HDFS

17/09/13 17:06:42 WARN TaskSetManager: Lost task 7340.0 in stage 16.0 (TID 100118, XXXXXX.com, executor 2358): java.io.IOException: Failing write. Tried pipeline recovery 5 times without success.
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:865)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:401)
        Suppressed: java.lang.IllegalArgumentException: Self-suppression not permitted
                at java.lang.Throwable.addSuppressed(Throwable.java:1043)
                at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
                at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
                at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
                at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$8.apply$mcV$sp(PairRDDFunctions.scala:1218)
                at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1359)
                at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1218)
                at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1197)
                at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
                at org.apache.spark.scheduler.Task.run(Task.scala:99)
                at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                at java.lang.Thread.run(Thread.java:748)
        [CIRCULAR REFERENCE:java.io.IOException: Failing write. Tried pipeline recovery 5 times without success.]

the final task in the stage is .saveAsTextFile(), In the Spark UI i am able to see that other tasks prior to .saveAsTextFile() finishes successfully. Using Spark 2.0.0 in YARN mode.

EDIT: I have already seen the answer on Spark: Self-suppression not permitted when writing big file to HDFS and i made sure that issues mentioned in that answer were not the case here.

vdep
  • 3,541
  • 4
  • 28
  • 54
  • can you provide the code too? – Ramesh Maharjan Sep 12 '17 at 14:43
  • @RameshMaharjan Sorry, I cannot share the code publicly, But i think the problem is with the last step (i.e `.saveAsTextFile()`) I can see that 99% of total task in the particular stage gets completed. – vdep Sep 12 '17 at 14:49
  • can you post full error message then? – Ramesh Maharjan Sep 12 '17 at 15:34
  • @RameshMaharjan, well this is the full message. Remaining all are just INFOs – vdep Sep 12 '17 at 15:37
  • read this blog http://blog.dandoy.org/2012/12/self-suppression-not-permitted.html and implement try catch . :) I hope it helps – Ramesh Maharjan Sep 12 '17 at 15:47
  • this issue is known issue fixed in spark 2.2.1- [JIRA](https://issues.apache.org/jira/browse/SPARK-21170) – Rahul Sharma Sep 12 '17 at 19:13
  • Possible duplicate of [Spark: Self-suppression not permitted when writing big file to HDFS](https://stackoverflow.com/questions/34390854/spark-self-suppression-not-permitted-when-writing-big-file-to-hdfs) – Rahul Sharma Sep 12 '17 at 19:13
  • 1
    I'm facing this too while using `df.write` for large dataframes. Please update the thread if you find a solution. – philantrovert Sep 13 '17 at 07:11
  • @RameshMaharjan, I have updated the log. can you please have a look at it now? – vdep Sep 13 '17 at 17:23
  • @philantrovert, are you getting `java.io.IOException: Failing write. Tried pipeline recovery 5 times without success.` for `df.write`(dataframe) too? – vdep Sep 13 '17 at 17:24
  • did you implement `try catch` and see whats the exact error? I didn't see any difference in your update from the previous one – Ramesh Maharjan Sep 13 '17 at 23:31
  • @vdep I was getting the Self Suppression error. I increased the number of executors and executor memory a little. And submitted the job with `--conf "spark.speculation=true"`. It seems to be working now but is still taking a lot of time to write to HDFS. About 30 mins for a table with 1 million rows. – philantrovert Sep 14 '17 at 05:53

0 Answers0