3

I have a very large Dataframe with 8000 columns and 50000 rows. I want to write its statistics information into excel file. I think we can use describe() method. But how to write it to excel in good format. Thanks

David
  • 11,245
  • 3
  • 41
  • 46
Ajg
  • 247
  • 2
  • 5
  • 14
  • Excel can open a csv (comma-separated values) file as an ordinary spreadsheet. So the easiest thing is to just print any output as comma-separated values and then you can just open it with Excel. – Robert Dodier Apr 21 '17 at 17:03
  • True, but best to convert it to a pandas dataframe first so you don't have to worry about part files – David Apr 21 '17 at 17:32

1 Answers1

6

The return type for describe is a pyspark dataframe. The easiest way to get the describe dataframe into an excel readable format is to convert it to a pandas dataframe and then write the pandas dataframe out as a csv file as below

import pandas
df.describe().toPandas().to_csv('fileOutput.csv')

If you want it in excel format, you can try below

import pandas
df.describe().toPandas().to_excel('fileOutput.xls', sheet_name = 'Sheet1', index = False)

Note, the above requires xlwt package to be installed (pip install xlwt in the command line)

David
  • 11,245
  • 3
  • 41
  • 46