1

I am trying to write to a text file after applying the map, reduce operations. The below code is creating 8 files, but I need only one file

df3.rdd.map(_.toSeq.map(_+"").reduce(_+" "+_)).saveAsTextFile("/home/ram/Desktop/test4")

Please suggest how to write content to a single file

2 Answers2

1

The best option is "coalesce". The coalesce method reduces the number of partitions in a DataFrame.

here is the code for your question.

df3.coalesce(1).rdd.map(_.toSeq.map(_+"").reduce(_+" "+_)).saveAsTextFile("/home/ram/Desktop/test4")

Because it will give good performance by avoiding data movement. please check the below link.

Spark - repartition() vs coalesce()

Pala
  • 2,011
  • 3
  • 15
  • 17
0

It is creating multiple files because each partition is saved individually. If you need a single output file inside a folder then you can repartition or coalesce to write to a single file.

df3.repartition(1).rdd.map(_.toSeq.map(_+"").reduce(_+" "+_)).saveAsTextFile("/home/ram/Desktop/test4")
dassum
  • 4,727
  • 2
  • 25
  • 38