Multiple aggregations on multiple columns

Question

I am using Python in a Pyspark framework. I am trying to apply different aggregations on different columns using groupby

I have a df with columns col1, col2, col3, col4 I want to do something like: df.groupby("col1").sum("col2", "col3").avg("col4")

But I am getting an error:

'DataFrame' object has no attribute 'avg' Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1301, in getattr "'%s' object has no attribute '%s'" % (self.class.name, name)) AttributeError: 'DataFrame' object has no attribute 'avg'

Possible duplicate of [multiple criteria for aggregation on pySpark Dataframe](https://stackoverflow.com/questions/40274508/multiple-criteria-for-aggregation-on-pyspark-dataframe) and [Multiple Aggregate operations on the same column of a spark dataframe](https://stackoverflow.com/questions/34954771/multiple-aggregate-operations-on-the-same-column-of-a-spark-dataframe) and [Spark SQL: apply aggregate functions to a list of columns](https://stackoverflow.com/questions/33882894/spark-sql-apply-aggregate-functions-to-a-list-of-columns) — pault, Sep 27 '19 at 17:01

score 0 · Accepted Answer · answered Sep 28 '19 at 02:11

0

This is how I am doing it in my modules:

import pyspark.sql.functions as Functions

df2=df.groupBy('col1').agg(Functions.sum('col2'),Functions.sum('col3'),Functions.avg('col4'))

answered Sep 28 '19 at 02:11

Harmeet

193
2
9

Multiple aggregations on multiple columns

1 Answers1