-1

I would like to save a Dataset[Row] as text file with a specific name in specific location. Can anybody help me?

I have tried this, but this produce me a folder (LOCAL_FOLDER_TEMP/filename) with a parquet file inside of it: Dataset.write.save(LOCAL_FOLDER_TEMP+filename)

Thanks

jorgemaagomes
  • 27
  • 1
  • 1
  • 5

3 Answers3

3

You can`t save your dataset to specific filename using spark api, there is multiple workarounds to do that.

  1. as Vladislav offered, collect your dataset then write it into your filesystem using scala/java/python api.
  2. apply repartition/coalesce(1), write your dataset and then change the filename.

both are not very recommended, because in large datasets it can cause OOM or just lost of the power of spark`s parallelism.

The second issue that you are getting parquet file, its because the default format of spark, you should use:

  df.write.format("text").save("/path/to/save")
ShemTov
  • 687
  • 3
  • 8
2

Please use

RDD.saveAsTextFile()

It Write the elements of the dataset as a text file (or set of text files) in a given directory in the local filesystem, HDFS or any other Hadoop-supported file system. Spark will call toString on each element to convert it to a line of text in the file.

Refer Link : rdd-programming-guide

Ajinkya Bhore
  • 144
  • 1
  • 1
  • 12
1

Spark always creates multiple files - one file per partition. If you want a single file - you need to do collect() and then just write it to file the usual way.

Vladislav Varslavans
  • 2,775
  • 4
  • 18
  • 33