0

I would like to select the item that has the greatest value. For exemple in this table I would like to select MAC09

Identifiant Val
MAC26 36
MAC10 9
MAC02 2
MAC32 11
MAC09 37
MAC28 10
elokema
  • 107
  • 6

1 Answers1

1

there are several way of doing it, here is a solution using a rank

from pyspark.sql import functions as F, Window


df.withColumn("rnk", F.rank().over(Window.orderBy(F.col("Val").desc()))).where(
    "rnk = 1"
).drop("rnk").show()
+-----------+---+                                                               
|Identifiant|Val|
+-----------+---+
|      MAC09| 37|
+-----------+---+
Steven
  • 14,048
  • 6
  • 38
  • 73