4

I have a dataframe which I want to write it as single json file with a specific name. But it is creating a partitioned file within the stated filename. How do I get this to write data directly to the filename I passed into my parameter? Below is the code in Python:

df_3.coalesce(1).write.format('json').mode('overwrite').save(filename)

Data is now written to mylocation.json/part-00000 and I just want this to be mylocation.json file.

I would appreciate any help.

Young
  • 41
  • 2
  • 1
    Refer this - https://stackoverflow.com/questions/41990086/specifying-the-filename-when-saving-a-dataframe-as-a-csv – dsk Jul 08 '21 at 06:01
  • 1
    https://stackoverflow.com/questions/40792434/spark-dataframe-save-in-single-file-on-hdfs-location – dsk Jul 08 '21 at 06:01

1 Answers1

0

I think that you need to use mode('append') instead – and no need to use coalesce, unless you want to force it to run only on a single partition:

df_3.write.format('json').mode('append').save(filename)
Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
iron_bat
  • 21
  • 2