Why does spark, while saving result to a file system, uploads result files to a _temporary directory and then move them to output folder instead of directly uploading them to output folder?
1 Answers
Two stage process is the simplest way to ensure consistency of the final result when working with file systems.
You have to remember that each executor thread writes its result set independent of the other threads and writes can be performed at different moments in time or even reuse the same set of resources. At the moment of write Spark cannot determine if all writes will succeed.
- In case of failure one can rollback the changes by removing temporary directory.
- In case of success one can commit the changes by moving temporary directory.
Another benefit of this model is clear distinction between writes in progress and finalized output. As a result it can easily integrated with simple workflow management tools, without a need of having a separate state store or other synchronization mechanism.
This model is simple, reliable and works well with file systems for which it has been designed. Unfortunately it doesn't perform that well with object stores, which don't support moves.

- 322,348
- 103
- 959
- 935
-
2is `_temporary` directory supposed to be deleted once the job run completes? – Omkar Puttagunta Aug 30 '18 at 17:59
-
1@OmkarPuttagunta Normally it should (to be precise it should be atomically moved). However there are cases when this step cannot be completed - https://stackoverflow.com/q/51603404/6910411 – zero323 Sep 01 '18 at 00:10
-
In my case, I am running in `standalone mode` and don't have `HDFS` on the cluster! So its writing creating `part files` in one worker and on the other it is always creating `part files ` in `_temporary/`, I think that is the issue – Omkar Puttagunta Sep 01 '18 at 01:24
-
@OmkarPuttagunta Yeah, without shared storage write process cannot be completed - you'll have a partial (not committed) results scattered all over the cluster. – zero323 Sep 01 '18 at 10:52