Spark: java.io.IOException: No space left on device

Question

Now I am learning how to use spark.I have a piece of code which can invert a matrix and it works when the order of the matrix is small like 100.But when the order of the matrix is big like 2000 I have an exception like this:

15/05/10 20:31:00 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/spark-local-20150510200122-effa/28/temp_shuffle_6ba230c3-afed-489b-87aa-91c046cadb22

java.io.IOException: No space left on device

In my program I have lots of lines like this:

val result1=matrix.map(...).reduce(...)
val result2=result1.map(...).reduce(...)
val result3=matrix.map(...)

(sorry about that because the code is to many to write there)

So I think when I do this Spark create some new rdds,and in my program Spark creates too many rdds so I have the exception.I am not sure if what I thought is correct.

How can I delete the rdds that I won't use any more?Like result1 and result2?

I have tried rdd.unpersist(), it doesn't work.

I might be wrong, but usually spark keeps everything in memory and if it's filling your hard-drive probably you didn't give him enough RAM to start with. Anyway you can't delete RDDs that you "think" that you are not using anymore.. — Vittorio Cozzolino, May 11 '15 at 08:32
You should not have to delete them. Result_i is keep only as long as its needed to compute result_{i+1} (its can still be stored but it can get overriden). Its possible that you cant store temp files from one of your computations. — abalcerek, May 11 '15 at 08:58
But I don't why I have the IOException which said there is no space left on device... — 赵祥宇, May 11 '15 at 09:09
This answer from the Databricks support forum may be relevant: https://forums.databricks.com/questions/277/how-do-i-avoid-the-no-space-left-on-device-error.html — Josh Rosen, May 11 '15 at 20:22

score 12 · Accepted Answer · edited Feb 13 '22 at 08:56

12

This is because Spark create some temp shuffle files under /tmp directory of you local system.You can avoid this issue by setting below properties in your spark conf files.

Set the following properties in spark-env.sh.
(change the directories accordingly to whatever directory in your infra, that has write permissions set and with enough space in it)

SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark -Dhadoop.tmp.dir=/mnt/ephemeral-hdfs"

export SPARK_JAVA_OPTS

You can also set the spark.local.dir property in $SPARK_HOME/conf/spark-defaults.conf as stated by @EUgene below

edited Feb 13 '22 at 08:56

Mehdi LAMRANI

11,289
14
88
130

answered Jan 28 '16 at 22:31

rahul gulati

188
3
12

2

How does this setting avoid temp file disk space problem? – Ravi Chandra May 30 '17 at 09:43
Same problem , this solution does not work for me :( – user1384205 Feb 12 '19 at 08:16
This solution works, provided you give spark a writeable folder where there is enough space (do not just copy paste the given answer as is) – Mehdi LAMRANI Feb 13 '22 at 08:49

score 4 · Answer 2 · answered May 11 '15 at 09:12

According to the Error message you have provided, your situation is no disk space left on your hard-drive. However, it's not caused by RDD persistency, but shuffle which you implicitly required when calling reduce.

Therefore, you should clear your drive and make more spaces for your tmp folder

score 1 · Answer 3 · answered Oct 26 '19 at 12:35

1

As a complementary, to specify default folder for you shuffle tmp files, you can add below line to $SPARK_HOME/conf/spark-defaults.conf:

spark.local.dir /mnt/nvme/local-dir,/mnt/nvme/local-dir2

answered Oct 26 '19 at 12:35

Eugene

10,627
5
49
67

Spark: java.io.IOException: No space left on device

3 Answers3

Linked