Spark dataframe calculate the row-wise minimum

Question

I'm trying to put the minimum value of a few columns into a separate column. (Creating the min column). The operation is pretty straight forward but I wasn't able to find the right function for that:
A B min
1 2 1
2 1 1
3 1 1
1 4 1

Thanks a lot for your help!

Psidom · Accepted Answer · 2018-08-22T20:27:33.203

You can use the least function, in pyspark:

from pyspark.sql.functions import least
df.withColumn('min', least('A', 'B')).show()
#+---+---+---+
#|  A|  B|min|
#+---+---+---+
#|  1|  2|  1|
#|  2|  1|  1|
#|  3|  1|  1|
#|  1|  4|  1|
#+---+---+---+

If you have a list of column names:

cols = ['A', 'B']
df.withColumn('min', least(*cols))

Similarly in Scala:

import org.apache.spark.sql.functions.least
df.withColumn("min", least($"A", $"B")).show
+---+---+---+
|  A|  B|min|
+---+---+---+
|  1|  2|  1|
|  2|  1|  1|
|  3|  1|  1|
|  1|  4|  1|
+---+---+---+

If the columns are stored in a Seq:

val cols = Seq("A", "B")    
df.withColumn("min", least(cols.head, cols.tail: _*))

Spark dataframe calculate the row-wise minimum

1 Answers1

Related