I have a dataframe with values
#+-------+---------+-----+
#|name1 |name 2 |score|
#+-------+---------+-----+
#| abcdef| abcghi | 3|
#| abcdef| abcjkl | 3|
#| abcdef| abcyui | 3|
#| abcdef| abrtyu | 4|
#| pqrstu| pqrswe | 2|
#| pqrstu| pqrsqw | 2|
#| pqrstu| pqrzxc | 3|
#+-------+---------+-----+
I need to group by name1 and pick the rows with the least score.
I understand I can pick the top row after a groupby on name1 and sort the score in ascending order and pick the first row. I do this by
joined_windows = Window().partitionBy("name1").orderBy(col("score").asc())
result = joined_df.withColumn("rn", row_number().over(joined_windows)).where(col("rn") == 1).drop("rn")
But I want the dataframe to hold the following values (ie., set of rows with the least score in each group.
#+-------+---------+-----+
#|name1 |name 2 |score|
#+-------+---------+-----+
#| abcdef| abcghi | 3|
#| abcdef| abcjkl | 3|
#| abcdef| abcyui | 3|
#| pqrstu| pqrswe | 2|
#| pqrstu| pqrsqw | 2|
#+-------+---------+-----+