Spark Scala Data Frame to have multiple aggregation of single Group By

Question

Spark Scala Data Frame to have multiple aggregation of single group by. eg

val groupped = df.groupBy("firstName", "lastName").sum("Amount").toDF()

But What if I need Count, Sum, Max etc

/* Below Does Not Work  , but this is what the intention is  
val groupped = df.groupBy("firstName", "lastName").sum("Amount").count().toDF()
*/

output groupped.show()

--------------------------------------------------
| firstName | lastName| Amount|count | Max | Min  |
--------------------------------------------------

// Compute the max age and average salary, grouped by department and gender. ds.groupBy($"department", $"gender").agg(Map( "salary" -> "avg", "age" -> "max" )) see groupyBy in the documentation for examples https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.sql.Dataset — C.S.Reddy Gadipally, Jun 17 '19 at 18:22
@user10958683 True.. Its a Duplicate , But Zaks answer is more readable — user2458922, Jun 17 '19 at 19:05

score 2 · Accepted Answer · answered Jun 17 '19 at 17:49

case class soExample(firstName: String, lastName: String, Amount: Int)
val df =  Seq(soExample("me", "zack", 100)).toDF

import org.apache.spark.sql.functions._

val groupped = df.groupBy("firstName", "lastName").agg(
     sum("Amount"),
     mean("Amount"), 
     stddev("Amount"),
     count(lit(1)).alias("numOfRecords")
   ).toDF()

display(groupped)

Spark Scala Data Frame to have multiple aggregation of single Group By

1 Answers1

Linked