-1

I am trying to write the dataframe to csv file with option sheetName but it is not working for me.

df13.coalesce(1).write.option("delimiter",",").mode(SaveMode.Overwrite).option("sheetName","Info").option("header","true").option("escape","").option("quote","").csv("path")

Can anyone help me on that

3 Answers3

0

I don't think in CSV file you actually have a sheet name , ideally the filename is the sheet name in a CSV file. Can you try changing to excel and try..

dsk
  • 1,863
  • 2
  • 10
  • 13
  • how can we use excel and write. – user13748181 Jun 15 '20 at 08:03
  • Can you please look into this - https://github.com/crealytics/spark-excel – dsk Jun 15 '20 at 08:06
  • I am running the spark commands using spark-shell. I have tried this command to add package but no luck spark-shell --packages com.crealytics:spark-excel_2.11:0.13.1 – user13748181 Jun 15 '20 at 08:25
  • Getting this error while runnig --package command. Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: com.crealytics#spark-excel_2.11;0.13.1: not found] – user13748181 Jun 15 '20 at 08:42
  • Writing the multiple frame to 1 excel file but in different sheet using below command but i am getting error while writing to different sheet file already exists. df7.coalesce(1).write.format("com.crealytics.spark.excel") .option("dataAddress", "'Second'!B1:M300") .option("useHeader","false") .save(s"${desPath}/${snapshotDate}/BIS.xlsx") df7.coalesce(1).write.format("com.crealytics.spark.excel") .option("dataAddress", "'First'!B1:M300") .option("useHeader","false") .save(s"${desPath}/${snapshotDate}/BIS.xlsx") – user13748181 Jun 16 '20 at 13:42
0

Spark can't directly do this while writing as a csv, There is no option as sheetName, The output path is path you mention as .csv("path").

Spark uses hadoops file format, which is partitioned in multiple part files under the output path, 1 part file on your case. Also do not repartitions to 1 unless you really need it.

One thing you can do is write the dataframe without repartition and use HADOOP API to merge those small many part files to single.
Here is more on detail Write single CSV file using spark-csv

koiralo
  • 22,594
  • 6
  • 51
  • 72
  • I have checked that link but i am using S3 as data storage and i don't mind to use repartition or coalesce bec my data is very small in size. But i am not able to write 3 dataframe to 3 different sheet in 1 csv file. – user13748181 Jun 15 '20 at 08:29
  • I am not using hdfs path. – user13748181 Jun 15 '20 at 08:39
  • My question is how to change the sheetName in csv file and write 3 dataframe to 3 different sheet in 1 csv file – user13748181 Jun 15 '20 at 08:41
  • Nevermind, If you don't want to understand how spark works, There is no such thing sheet name in spark-csv options, And if you have very small size of data, I think you should consider other solution than using spark. – koiralo Jun 15 '20 at 08:49
  • if you don't have ans of my question that's ok . – user13748181 Jun 15 '20 at 08:54
  • Let me know when you find it, except 1. renaming your part files names or Creating your own custom Output format. – koiralo Jun 15 '20 at 08:57
  • Sure. Thanks for your time. I am really appreciated it – user13748181 Jun 15 '20 at 09:04
  • df7.coalesce(1).write.format("com.crealytics.spark.excel") .option("dataAddress", "'Second'!B1:M300") .option("useHeader","false") .save(s"${desPath}/${snapshotDate}/BIS.xlsx") df7.coalesce(1).write.format("com.crealytics.spark.excel") .option("dataAddress", "'First'!B1:M300") .option("useHeader","false") .save(s"${desPath}/${snapshotDate}/BIS.xlsx") If i write for more than 1 sheet i am getting the error file BIS.xlsx already exists. – user13748181 Jun 16 '20 at 13:38
0

We can only 1 default sheet in csv file if we want multiple sheet then we should write the dataframe to excel format instead of csv file format.

  • Writing the multiple dataframe to 1 excel file but in different sheet using below command but getting error. Error : Exception in thread "main" java.lang.RuntimeException: path /BIS.xlsx already exists. DataFrame1: df7.coalesce(1).write.format("com.crealytics.spark.excel") .option("dataAddress", "'Second'!B1:M300") .option("useHeader","false") .save(s"${desPath}/${snapshotDate}/BIS.xlsx") DataFrame2: df7.coalesce(1).write.format("com.crealytics.spark.excel") .option("dataAddress", "'First'!B1:M300") .option("useHeader","false") .save(s"${desPath}/${snapshotDate}/BIS.xlsx") – user13748181 Jun 16 '20 at 14:28