how to save a Dataset[row] as text file in spark?

Question

I would like to save a Dataset[Row] as text file with a specific name in specific location. Can anybody help me?

I have tried this, but this produce me a folder (LOCAL_FOLDER_TEMP/filename) with a parquet file inside of it: Dataset.write.save(LOCAL_FOLDER_TEMP+filename)

Thanks

score 3 · Accepted Answer · answered Dec 06 '19 at 12:27

You can`t save your dataset to specific filename using spark api, there is multiple workarounds to do that.

as Vladislav offered, collect your dataset then write it into your filesystem using scala/java/python api.
apply repartition/coalesce(1), write your dataset and then change the filename.

both are not very recommended, because in large datasets it can cause OOM or just lost of the power of spark`s parallelism.

The second issue that you are getting parquet file, its because the default format of spark, you should use:

  df.write.format("text").save("/path/to/save")

score 2 · Answer 2 · answered Dec 06 '19 at 12:44

Please use

RDD.saveAsTextFile()

It Write the elements of the dataset as a text file (or set of text files) in a given directory in the local filesystem, HDFS or any other Hadoop-supported file system. Spark will call toString on each element to convert it to a line of text in the file.

Refer Link : rdd-programming-guide

score 1 · Answer 3 · answered Dec 06 '19 at 12:18

1

Spark always creates multiple files - one file per partition. If you want a single file - you need to do collect() and then just write it to file the usual way.

answered Dec 06 '19 at 12:18

Vladislav Varslavans

2,775
4
18
33

how to save a Dataset[row] as text file in spark?

3 Answers3