How to select the item that has the greatest value in dataframe ? In Pyspark

Question

I would like to select the item that has the greatest value. For exemple in this table I would like to select MAC09

Identifiant	Val
MAC26	36
MAC10	9
MAC02	2
MAC32	11
MAC09	37
MAC28	10

score 1 · Accepted Answer · answered Sep 30 '21 at 15:45

there are several way of doing it, here is a solution using a rank

from pyspark.sql import functions as F, Window


df.withColumn("rnk", F.rank().over(Window.orderBy(F.col("Val").desc()))).where(
    "rnk = 1"
).drop("rnk").show()
+-----------+---+                                                               
|Identifiant|Val|
+-----------+---+
|      MAC09| 37|
+-----------+---+

How to select the item that has the greatest value in dataframe ? In Pyspark

1 Answers1

Linked