Pyspark - from long to wide with new column names

Question

I have this dataframe:

data = [{"name": "test", "sentiment":'positive', "avg":13.65, "stddev":15.24},
{"name": "test", "sentiment":'neutral', "avg":338.74, "stddev":187.27},
{"name": "test", "sentiment":'negative', "avg":54.58, "stddev":50.19}]

df = spark.createDataFrame(data).select("name", "sentiment", "avg", "stddev")
df.show()
      +----+---------+------+------+
      |name|sentiment|   avg|stddev|
      +----+---------+------+------+
      |test| positive| 13.65| 15.24|
      |test|  neutral|338.74|187.27|
      |test| negative| 54.58| 50.19|
      +----+---------+------+------+

I'd like to create a dataframe with this structure:

+----+------------+-----------+------------+------------+-----------+------------+
|name|avg_positive|avg_neutral|avg_negative|std_positive|std_neutral|std_negative|
+----+------------+-----------+------------+------------+-----------+------------+
|test|       13.65|     338.74|       54.58|       15.24|     187.27|       50.19|
+----+------------+-----------+------------+------------+-----------+------------+

I also don't know the name of this operation, feel free to suggest a proper title. Thanks!

Does this answer your question? [How to pivot Spark DataFrame?](https://stackoverflow.com/questions/30244910/how-to-pivot-spark-dataframe) — Kafels, Jul 05 '21 at 15:23
This is pivot function - This is one example of multiple columns pivoting https://stackoverflow.com/questions/45035940/how-to-pivot-on-multiple-columns-in-spark-sql — Rafa, Jul 05 '21 at 15:28
@Skupak - will appreciate if you can help accept my answer :) thanks in advance — dsk, Jul 06 '21 at 07:45

score 1 · Accepted Answer · answered Jul 05 '21 at 15:31

use groupBy() and pivot()

    df_grp = df.groupBy("name").pivot("sentiment").agg((F.first("avg").alias("avg")),(F.first("stddev").alias("stddev")) )
df_grp.show()
    
    
    +----+------------+---------------+-----------+--------------+------------+---------------+
|name|negative_avg|negative_stddev|neutral_avg|neutral_stddev|positive_avg|positive_stddev|
+----+------------+---------------+-----------+--------------+------------+---------------+
|test|       54.58|          50.19|     338.74|        187.27|       13.65|          15.24|
+----+------------+---------------+-----------+--------------+------------+---------------+

rename the columns if you really want to

Pyspark - from long to wide with new column names

1 Answers1