I have a dataframe with several numeric columns which are not fixed (they can change during each execution). Let's say I have a Seq object with the names of the numeric columns. I would like to apply an aggregation function for each of these columns. I have tried the following:
println(numeric_cols)
// -> Seq[String] = List(avgTkts_P1, avgTkts_P2, avgTkts_P3, avgTkts_P4)
var sum_ops = for (c <- numeric_cols) yield org.apache.spark.sql.functions.sum(c).as(c)
var result = df.groupBy($"ID").agg( sum_ops:_* )
But it gives me the following error:
scala> var avgTktsPerPeriodo = df.groupBy("ID").agg(sum_ops:_*)
<console>:79: error: overloaded method value agg with alternatives:
(expr: org.apache.spark.sql.Column,exprs: org.apache.spark.sql.Column*)org.apache.spark.sql.DataFrame <and>
(exprs: java.util.Map[String,String])org.apache.spark.sql.DataFrame <and>
(exprs: scala.collection.immutable.Map[String,String])org.apache.spark.sql.DataFrame <and>
(aggExpr: (String, String),aggExprs: (String, String)*)org.apache.spark.sql.DataFrame
cannot be applied to (org.apache.spark.sql.Column)
Any idea if this is doable in spark-scala?