I am using Python 2.6.6 and Spark 1.6.0. I have df
like this:
id | name | number |
--------------------------
1 | joe | 148590 |
2 | bob | 148590 |
2 | steve | 279109 |
3 | sue | 382901 |
3 | linda | 148590 |
Whenever I try to run something like
df2 = df.groupBy('id','length','type').pivot('id').agg(collect_list('name'))
, I get the following error
pyspark.sql.utils.AnalysisException: u'undefined function collect_list;'
Why is this?
I have also tried:
hive_context = HiveContext(sc)
df2 = df.groupBy('id','length','type').pivot('id').agg(hive_context.collect_list('name'))
and get the error:
AttributeError: 'HiveContext' object has no attribute 'collect_list'