0

I am using Python in a Pyspark framework. I am trying to apply different aggregations on different columns using groupby

I have a df with columns col1, col2, col3, col4 I want to do something like: df.groupby("col1").sum("col2", "col3").avg("col4")

But I am getting an error:

'DataFrame' object has no attribute 'avg' Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1301, in getattr "'%s' object has no attribute '%s'" % (self.class.name, name)) AttributeError: 'DataFrame' object has no attribute 'avg'

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
thentangler
  • 1,048
  • 2
  • 12
  • 38
  • Possible duplicate of [multiple criteria for aggregation on pySpark Dataframe](https://stackoverflow.com/questions/40274508/multiple-criteria-for-aggregation-on-pyspark-dataframe) and [Multiple Aggregate operations on the same column of a spark dataframe](https://stackoverflow.com/questions/34954771/multiple-aggregate-operations-on-the-same-column-of-a-spark-dataframe) and [Spark SQL: apply aggregate functions to a list of columns](https://stackoverflow.com/questions/33882894/spark-sql-apply-aggregate-functions-to-a-list-of-columns) – pault Sep 27 '19 at 17:01

1 Answers1

0

This is how I am doing it in my modules:

import pyspark.sql.functions as Functions

df2=df.groupBy('col1').agg(Functions.sum('col2'),Functions.sum('col3'),Functions.avg('col4'))

Harmeet
  • 193
  • 2
  • 9