I'm trying to replicate the solution of this question in PySpark (Spark < 2.3, so no map_keys): How to get keys and values from MapType column in SparkSQL DataFrame Below is my code (same df of the linked question above):
import pyspark.sql.functions as F
distinctKeys = df\
.select(F.explode("alpha"))\
.select("key")\
.distinct()\
.rdd
df.select("id", distinctKeys.map(lambda x: "alpha".getItem(x).alias(x))
However, this code gives the error: AttributeError: 'PipelineRDD' object has no attribute '_get_object_id'
. Any thoughts on how to fix it?