I am trying to use pivot in Apache Spark.
My data is:
+--------------------+---------+
| timestamp| user|
+--------------------+---------+
|2017-12-19T00:41:...|User_1|
|2017-12-19T00:01:...|User_2|
|2017-12-19T00:01:...|User_1|
|2017-12-19T00:01:...|User_1|
|2017-12-19T00:01:...|User_2|
+--------------------+---------+
I want to pivot on the user column.
But I keep getting the error:
'DataFrame' object has no attribute 'pivot'
Traceback (most recent call last):
File "/usr/hdp/current/spark2-client/python/pyspark/sql/dataframe.py", line 1020, in __getattr__
"'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'pivot'
No Matter how I use it.
i.e. df.groupBy('A').pivot('B') or df.pivot('B')
My actual query is:
# The Pivot operation will give timestamp vs Users data
pivot_pf = tf.groupBy(window(tf["timestamp"], "2 minutes"), 'user').count().select('window.start', 'user', 'count').pivot("user").sum("count")
Any help is greatly appreciated.
Thanks.