I have a large Dataframe, need to write this to a CSV file, with a customer header/footer. For example, adding "Hello" to the Header and "WOrld" to the footer. How can I achieve this in Spark?
Asked
Active
Viewed 906 times
1
-
2If your dataframe have multiple partition you will end up with multiple csv files when saving. You need to do `repartition(1)` to get a single file but it will be at the cost of performance. As far as I know, there is no easy way to add additional header/footer directly with Spark. – Shaido Aug 27 '19 at 09:40
-
First, don't need to repartition, you can use coalesce(1). Also, I figured out away, which is using RDD, but waiting for anyone has better ideas – Tuong Le Aug 27 '19 at 09:49
-
If you want all data on a single partition, doing `repartition` or `coalesce` will be the same, i.e. all data will be put on a single node. Depending on your use case, I guess you could convert each row to a string, add the extra data and then save as a text file. Similar to the answer here: https://stackoverflow.com/questions/31898964/how-to-write-the-resulting-rdd-to-a-csv-file-in-spark-python. If the header/footer have the same format as the regular data in the dataframe, better solutions would be possible. – Shaido Aug 27 '19 at 09:56
-
Some things are simply not provided. – thebluephantom Aug 27 '19 at 13:09