While writing in CSV file, automatically folder is created and then csv file with cryptic name is created, how to create this CSV with any specific name but without creating folder in pyspark not in pandas.
-
What command are you currently using to create the CSV? This other question provides several answers that don't depend on Pandas and let you specify the name of the output file. https://stackoverflow.com/questions/31385363/how-to-export-a-table-dataframe-in-pyspark-to-csv Most seem to use `.write.csv("my_csv.csv")`. – Sarah Messer Nov 09 '21 at 21:19
3 Answers
That's just the way Spark works with the parallelizing mechanism. Spark application meant to have one or more workers to read your data and to write into a location. When you write a CSV file, having a directory with multiple files is the way multiple workers can write at the same time.
If you're using HDFS, you can consider writing another bash script to move or reorganize files the way you want
If you're using Databricks, you can use dbutils.ls
to interact with DBFS files in the same way.

- 5,836
- 1
- 13
- 31
This is the way spark is designed to write out multiple files in parallel. Writing out many files at the same time is faster for big datasets. But still you can achieve by using of coalesce(1,true).saveAsTextFile()
.You can refer here

- 1,920
- 13
- 35
-
no, actually i want it to split into parts and it should not be saved with folder structure and also that part*** something cryptic name is setting for csv file,instead of that name any specific name should be saved – cookie Nov 11 '21 at 07:23
In PySpark, the following code helped me to directly write data into CSV file
df.toPandas().to_csv('FileName.csv')

- 71
- 1
- 10