I am grouping my spark Dataframe by a filed and trying to collect all the elements associated with that group/key in an array. I am using collect_list() inside .agg(). I am using Scala Like:
val ndf = grp.agg(collect_list(col("site")))
Here grp is the data frame I get after grouping and "site" is the column that I am collecting the entries from.
This works if I run in the spark-shell. But not when I running my entire code with spark-submit. I am importing:
import org.apache.spark.sql.functions._
Which is where this collect_list method is.
Both Spark version are same. The only difference is that spark-shell initializes Hivecontext by default but is not in my flow. But from what I know this has nothing to do with hive context.
Whats the issue here? Someone also has the same issue here:
https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.html