10

Now I am learning how to use spark.I have a piece of code which can invert a matrix and it works when the order of the matrix is small like 100.But when the order of the matrix is big like 2000 I have an exception like this:

15/05/10 20:31:00 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/spark-local-20150510200122-effa/28/temp_shuffle_6ba230c3-afed-489b-87aa-91c046cadb22

java.io.IOException: No space left on device

In my program I have lots of lines like this:

val result1=matrix.map(...).reduce(...)
val result2=result1.map(...).reduce(...)
val result3=matrix.map(...)

(sorry about that because the code is to many to write there)

So I think when I do this Spark create some new rdds,and in my program Spark creates too many rdds so I have the exception.I am not sure if what I thought is correct.

How can I delete the rdds that I won't use any more?Like result1 and result2?

I have tried rdd.unpersist(), it doesn't work.

Cœur
  • 37,241
  • 25
  • 195
  • 267
赵祥宇
  • 497
  • 3
  • 9
  • 19
  • I might be wrong, but usually spark keeps everything in memory and if it's filling your hard-drive probably you didn't give him enough RAM to start with. Anyway you can't delete RDDs that you "think" that you are not using anymore.. – Vittorio Cozzolino May 11 '15 at 08:32
  • You should not have to delete them. Result_i is keep only as long as its needed to compute result_{i+1} (its can still be stored but it can get overriden). Its possible that you cant store temp files from one of your computations. – abalcerek May 11 '15 at 08:58
  • But I don't why I have the IOException which said there is no space left on device... – 赵祥宇 May 11 '15 at 09:09
  • 2
    This answer from the Databricks support forum may be relevant: https://forums.databricks.com/questions/277/how-do-i-avoid-the-no-space-left-on-device-error.html – Josh Rosen May 11 '15 at 20:22

3 Answers3

12

This is because Spark create some temp shuffle files under /tmp directory of you local system.You can avoid this issue by setting below properties in your spark conf files.

Set the following properties in spark-env.sh.
(change the directories accordingly to whatever directory in your infra, that has write permissions set and with enough space in it)

SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark -Dhadoop.tmp.dir=/mnt/ephemeral-hdfs"

export SPARK_JAVA_OPTS

You can also set the spark.local.dir property in $SPARK_HOME/conf/spark-defaults.conf as stated by @EUgene below

Mehdi LAMRANI
  • 11,289
  • 14
  • 88
  • 130
rahul gulati
  • 188
  • 3
  • 12
4

According to the Error message you have provided, your situation is no disk space left on your hard-drive. However, it's not caused by RDD persistency, but shuffle which you implicitly required when calling reduce.

Therefore, you should clear your drive and make more spaces for your tmp folder

yjshen
  • 6,583
  • 3
  • 31
  • 40
1

As a complementary, to specify default folder for you shuffle tmp files, you can add below line to $SPARK_HOME/conf/spark-defaults.conf:

spark.local.dir /mnt/nvme/local-dir,/mnt/nvme/local-dir2

Eugene
  • 10,627
  • 5
  • 49
  • 67