I would like to select the item that has the greatest value. For exemple in this table I would like to select MAC09
Identifiant | Val |
---|---|
MAC26 | 36 |
MAC10 | 9 |
MAC02 | 2 |
MAC32 | 11 |
MAC09 | 37 |
MAC28 | 10 |
there are several way of doing it, here is a solution using a rank
from pyspark.sql import functions as F, Window
df.withColumn("rnk", F.rank().over(Window.orderBy(F.col("Val").desc()))).where(
"rnk = 1"
).drop("rnk").show()
+-----------+---+
|Identifiant|Val|
+-----------+---+
| MAC09| 37|
+-----------+---+