The solution described here (by zero323) is very close to what I want with two twists:
- How do I do it in Java?
- What if the column had a List of Strings instead of a single String and I want to collect all such lists into a single list after GroupBy(some other column)?
I am using Spark 1.6 and have tried to use
org.apache.spark.sql.functions.collect_list(Column col)
as described in the solution to that question, but got the following error
Exception in thread "main" org.apache.spark.sql.AnalysisException: undefined function collect_list; at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:65) at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:65) at scala.Option.getOrElse(Option.scala:121)