2

I'm trying to put the minimum value of a few columns into a separate column. (Creating the min column). The operation is pretty straight forward but I wasn't able to find the right function for that:
A B min
1 2 1
2 1 1
3 1 1
1 4 1

Thanks a lot for your help!

1 Answers1

7

You can use the least function, in pyspark:

from pyspark.sql.functions import least
df.withColumn('min', least('A', 'B')).show()
#+---+---+---+
#|  A|  B|min|
#+---+---+---+
#|  1|  2|  1|
#|  2|  1|  1|
#|  3|  1|  1|
#|  1|  4|  1|
#+---+---+---+

If you have a list of column names:

cols = ['A', 'B']
df.withColumn('min', least(*cols))

Similarly in Scala:

import org.apache.spark.sql.functions.least
df.withColumn("min", least($"A", $"B")).show
+---+---+---+
|  A|  B|min|
+---+---+---+
|  1|  2|  1|
|  2|  1|  1|
|  3|  1|  1|
|  1|  4|  1|
+---+---+---+

If the columns are stored in a Seq:

val cols = Seq("A", "B")    
df.withColumn("min", least(cols.head, cols.tail: _*))
Psidom
  • 209,562
  • 33
  • 339
  • 356