I don't know Scala but will give a PySpark solution that can easily be converted to Scala:
First you melt
the DF and then use pivot
to get the form you want. The pivot
operation is built-in, but we need to code the melt
by hand. The idea is to create a "long" DF where the "T" column values are in a single column and there is an additional column telling us which of the "T" columns the value represents. An easy way to achieve this is using the underlying RDD and an application of flatMap. In what follows, I have taken the liberty of renaming the categorical in your original DF as "category":
# these are the T columns that we want to "melt"
old_columns=['T1', 'T2', 'T3']
# these columns will provide us with the columns of the tranposed DF
# I have kept this general-so you could have multiple categories
new_columns=['category']
# function to transform each RDD element to multiple elements
# for flatmapping
def reshape_row(row):
melted_rows=[]
for oldcol in old_columns:
melted_rows.append([row[newcol] for newcol\
in new_columns]+[oldcol,row[oldcol]])
return(melted_rows)
# melt the DF
df_melted=df.rdd.flatMap(reshape_row)\
.toDF(schema=new_columns+['T_category', 'T_value'])\
df_melted.show()
+--------+----------+-------+
|category|T_category|T_value|
+--------+----------+-------+
| love| T1| 0.3|
| love| T2| 0.2|
| love| T3| 0.5|
| like| T1| 0.1|
| like| T2| 0.5|
| like| T3| 0.4|
| Sleep| T1| 0.1|
| Sleep| T2| 0.1|
| Sleep| T3| 0.8|
| Dance| T1| 0.3|
| Dance| T2| 0.4|
| Dance| T3| 0.3|
+--------+----------+-------+
Now we can pivot to get the answer
df_melted.groupby("T_category")\
.pivot("category")\
.agg(f.max("T_value"))\
.show()
+----------+-----+-----+----+----+
|T_category|Dance|Sleep|like|love|
+----------+-----+-----+----+----+
| T3| 0.3| 0.8| 0.4| 0.5|
| T1| 0.3| 0.1| 0.1| 0.3|
| T2| 0.4| 0.1| 0.5| 0.2|
+----------+-----+-----+----+----+