I have a pyspark dataframe that I am grouping by one column, and then I would like to apply several different aggregation functions, including some custom ones, to different columns. So basically what I'd like to do is this (I know the syntax is all wrong, it's just an illustration of what I'd like to do):
fraction = UserDefinedFunction(lambda x: sum(x)*100/count(col4),DoubleType())
exprs = {x: "sum" for x in [col1,col2,col3]; x: "avg" for x in [col1,col3]; x: "fraction" for x in [col1,col2]}
df1 = df.groupBy(col5).agg(*exprs)
I tried different versions of this, such as agg(sum(df.col1,df.col2,df.col3)
, avg(df.col1,df.col3)
, fraction(df.col1,df.col2))
, but nothing works.
I'd appreciate your help!