Saving dataframe to json file with a specific name without creating partition files in Pyspark

Question

I have a dataframe which I want to write it as single json file with a specific name. But it is creating a partitioned file within the stated filename. How do I get this to write data directly to the filename I passed into my parameter? Below is the code in Python:

df_3.coalesce(1).write.format('json').mode('overwrite').save(filename)

Data is now written to mylocation.json/part-00000 and I just want this to be mylocation.json file.

I would appreciate any help.

Refer this - https://stackoverflow.com/questions/41990086/specifying-the-filename-when-saving-a-dataframe-as-a-csv — dsk, Jul 08 '21 at 06:01
https://stackoverflow.com/questions/40792434/spark-dataframe-save-in-single-file-on-hdfs-location — dsk, Jul 08 '21 at 06:01

score 0 · Answer 1 · edited May 29 '23 at 06:45

0

I think that you need to use mode('append') instead – and no need to use coalesce, unless you want to force it to run only on a single partition:

df_3.write.format('json').mode('append').save(filename)

edited May 29 '23 at 06:45

Adrian Mole

49,934
160
51
83

answered May 29 '23 at 03:39

iron_bat

21
2

Saving dataframe to json file with a specific name without creating partition files in Pyspark

1 Answers1