1

I have the following code:

from pyspark.sql import functions as func

cols = ("id","size")

result = df.groupby(*cols).agg({
    func.max("val1"),
    func.median("val2"),
    func.std("val2")
})

But it fails in the line func.median("val2") with the message that median cannot be found in func. The same happens to std.

Fluxy
  • 2,838
  • 6
  • 34
  • 63

1 Answers1

5

For median you should use approxQuantile for 0.5.

For std the operation you are looking for is stddev, stddev_samp, or stddev_pop. All of this is fairly clear in the docs. https://spark.apache.org/docs/2.1.3/api/python/_modules/pyspark/sql/functions.html

Jeff
  • 251
  • 1
  • 6