1

I have a spark dataFrame and I want to aggregate values by multiple keys

As spark documentation suggests:

def groupBy(col1: String, cols: String*): GroupedData Groups the DataFrame using the specified columns, so we can run aggregation on them

So I do the following

 val keys = Seq("a", "b", "c")
 dataframe.groupBy(keys:_*).agg(...)

Intellij Idea throws me following errors:

  1. expansion for non repeated parameters
  2. Type mismatch: expected Seq[Column], actual Seq[String]

However, I can pass multiple arguments manually without errors:

dataframe.groupBy("a", "b", "c").agg(...)

So, my question is: How can I do this programmatically?

eliasah
  • 39,588
  • 11
  • 124
  • 154
Vadym B.
  • 681
  • 7
  • 21

1 Answers1

11

Either use columns with groupBy(cols: Column*)

import org.apache.spark.sql.functions.col

val keys = Seq("a", "b", "c").map(col(_))
dataframe.groupBy(keys:_*).agg(...)

or head / tail with groupBy(col1: String, cols: String*):

val keys = Seq("a", "b", "c") 
dataframe.groupBy(keys.head, keys.tail: _*).agg(...)  
zero323
  • 322,348
  • 103
  • 959
  • 935