I have a very large Dataframe with 8000 columns and 50000 rows.
I want to write its statistics information into excel file.
I think we can use describe()
method. But how to write it to excel in good format. Thanks
Asked
Active
Viewed 2.1k times
3
-
Excel can open a csv (comma-separated values) file as an ordinary spreadsheet. So the easiest thing is to just print any output as comma-separated values and then you can just open it with Excel. – Robert Dodier Apr 21 '17 at 17:03
-
True, but best to convert it to a pandas dataframe first so you don't have to worry about part files – David Apr 21 '17 at 17:32
1 Answers
6
The return type for describe
is a pyspark dataframe. The easiest way to get the describe
dataframe into an excel readable format is to convert it to a pandas dataframe and then write the pandas dataframe out as a csv file as below
import pandas
df.describe().toPandas().to_csv('fileOutput.csv')
If you want it in excel format, you can try below
import pandas
df.describe().toPandas().to_excel('fileOutput.xls', sheet_name = 'Sheet1', index = False)
Note, the above requires xlwt package to be installed (pip install xlwt in the command line)

David
- 11,245
- 3
- 41
- 46
-
Thanks for the reply, I had tried this. But the output in CSV file does not look much user friendly or readable. SO I wanted it in excel format. Thanks – Ajg Apr 21 '17 at 17:51
-
-
-
1
-
Can we use the above to write dataframes to multiple tabs in an excel sheet? – Bharath Feb 26 '18 at 16:37
-
you'd have to refactor the code a bit. See https://stackoverflow.com/questions/14225676/save-list-of-dataframes-to-multisheet-excel-spreadsheet – David Feb 26 '18 at 19:05