How to aggregate values in Spark?

Question

val trans = df.groupBy("userId").agg(collect_list("movieId") as "features")

How do I aggregate other columns in the dataFrame as well? For now It is only aggregating column MovieID.

See https://stackoverflow.com/questions/42850554/apache-spark-dataframe-groupby-agg-for-multiple-columns/42850745 and https://stackoverflow.com/questions/33882894/sparksql-apply-aggregate-functions-to-a-list-of-column — Tzach Zohar, Oct 04 '18 at 15:43
Possible duplicate of [Apache Spark Dataframe Groupby agg() for multiple columns](https://stackoverflow.com/questions/42850554/apache-spark-dataframe-groupby-agg-for-multiple-columns) — pault, Oct 04 '18 at 16:34
Possible duplicate of [SparkSQL: apply aggregate functions to a list of column](https://stackoverflow.com/questions/33882894/sparksql-apply-aggregate-functions-to-a-list-of-column) — zero323, Oct 04 '18 at 17:09

score 0 · Answer 1 · answered Oct 04 '18 at 18:02

0

You can add other aggregation like below example

val trans = df.groupBy("userId").agg(collect_list("movieId") as "features",avg("rating") as "avg_rating"))

answered Oct 04 '18 at 18:02

user3739478

1 Answers1