Suppose I've got a data frame :
+----+----+---+
| c1|name|qty|
+----+----+---+
|abc1| a| 1|
|abc2| a| 0|
|abc3| b| 3|
|abc4| b| 2|
+----+----+---+
I would like to get only rows with minimal qty
for every name
:
+----+----+---+
| c1|name|qty|
+----+----+---+
|abc2| a| 0|
|abc4| b| 2|
+----+----+---+
I am doing it like that:
df1 = df.groupBy('name').agg(sf.min('qty')).select("min(qty)")
df2 = df1.join(df, df1["min(qty)"] == df["qty"]).drop("min(qty)") // df2 is the result
It's working. I am wondering if it can be improved. How could you improve the solution above ?