We can use concat_ws
function to concat all columns for the row then use split
to create an array.
use array_sort
function to sort with in the array and extract second element[1]
of the array.
Example:
from pyspark.sql.functions import *
df=spark.createDataFrame([('83','32','14','62'),('63','32','74','55'),('13','88','6','46')],['Col1','Col2','Col3','Col4'])
df.selectExpr("array_sort(split(concat_ws(',',Col1,Col2,Col3,Col4),','))[1] Res").show()
#+---+
#|Res|
#+---+
#|32 |
#|55 |
#|13 |
#+---+
More Dynamic Way:
df.selectExpr("array_sort(split(concat_ws(',',*),','))[1]").show()
#+---+
#|Res|
#+---+
#|32 |
#|55 |
#|13 |
#+---+
EDIT:
#adding Res column to the dataframe
df1=df.selectExpr("*","array_sort(split(concat_ws(',',*),','))[1] Res")
df1.show()
#+----+----+----+----+---+
#|Col1|Col2|Col3|Col4|Res|
#+----+----+----+----+---+
#| 83| 32| 14| 62| 32|
#| 63| 32| 74| 55| 55|
#| 13| 88| 6| 46| 46|
#+----+----+----+----+---+