1

I have a dataframe,which has 'prod_key','prod_name','Sales','Volume'. I want to get all the descriptive statistics of the df.

groupby_cols = ['prod_key','prod_name']
funs = [F.mean, F.min, F.max,F.count]

aggregate_cols = [ 'Sales','Volume' ]

exprs = [f(F.col(c)) for f in funs for c in aggregate_cols]
df_description = df.groupBy(*groupby_cols).agg(*exprs)

I got null values in the max function results.Min function works fine. Anything wrong with this? Thanks.

newleaf
  • 2,257
  • 8
  • 32
  • 52
  • [How to make good reproducible Apache Spark Dataframe examples](https://stackoverflow.com/q/48427185/6910411) – zero323 Feb 02 '18 at 18:27

1 Answers1

0
  df = df.withColumn("Sales",df["Sales"].cast("float"))\
   .withColumn("Volume",df["Volume"].cast("float"))

df.Sales, df.Volume read in as String maybe because of there are null values. After I changed the data type from string to float, it works fine.

newleaf
  • 2,257
  • 8
  • 32
  • 52