0
val trans = df.groupBy("userId").agg(collect_list("movieId") as "features")

How do I aggregate other columns in the dataFrame as well? For now It is only aggregating column MovieID.

vielkind
  • 2,840
  • 1
  • 16
  • 16
  • Please add some example of input/output to make it clear. – pheeleeppoo Oct 04 '18 at 15:42
  • See https://stackoverflow.com/questions/42850554/apache-spark-dataframe-groupby-agg-for-multiple-columns/42850745 and https://stackoverflow.com/questions/33882894/sparksql-apply-aggregate-functions-to-a-list-of-column – Tzach Zohar Oct 04 '18 at 15:43
  • 2
    Possible duplicate of [Apache Spark Dataframe Groupby agg() for multiple columns](https://stackoverflow.com/questions/42850554/apache-spark-dataframe-groupby-agg-for-multiple-columns) – pault Oct 04 '18 at 16:34
  • 2
    Possible duplicate of [SparkSQL: apply aggregate functions to a list of column](https://stackoverflow.com/questions/33882894/sparksql-apply-aggregate-functions-to-a-list-of-column) – zero323 Oct 04 '18 at 17:09

1 Answers1

0

You can add other aggregation like below example

val trans = df.groupBy("userId").agg(collect_list("movieId") as "features",avg("rating") as "avg_rating"))