2

Say I have a dataframe that contains cars, their brand and their price. I would like to replace the avg below by median (or another percentile):

df.groupby('carBrand').agg(F.avg('carPrice').alias('avgPrice'))

However, it seems that there is no aggregation function that allows to compute this in Spark.

Greg
  • 6,038
  • 4
  • 22
  • 37

1 Answers1

2

You can try the approxQuantile function (see http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql.functions)

Assaf Mendelson
  • 12,701
  • 5
  • 47
  • 56