How to get the top n values from dataframe

Asked Jan 26 '18 at 09:35

Active Jan 26 '18 at 10:04

Viewed 480 times

I have a dataframe read from csv file, and it's pretty much like a score table, and it has 4 columns

school_name class_name, student_name, score

What I want to do is group the school and class, and see the top 3 scores in each of the classes, and I'm trying it in this way

val df = spark.read.format("csv")
  .option("sep", ",")
  .option("inferSchema", "true")
  .option("header", "true")
  .load("students.csv")

df.groupBy("school_name", "class_name")....

And, I'm just stuck here.

Any advice?

EDIT It's not the top 3 scores but top 3 scores in each of the classes.

edited Jan 26 '18 at 09:46

asked Jan 26 '18 at 09:35

Bomin

1,619
5
24
39

`df.groupBy("school_name", "class_name").agg(org.apache.spark.sql.functions.sum("score").desc).show(3)` is all you need. – Ramesh Maharjan Jan 26 '18 at 09:37
Thanks! I just updated my description. – Bomin Jan 26 '18 at 09:51
1

@RameshMaharjan there should be a dupe for this somewhere – eliasah Jan 26 '18 at 10:01
yes may be @eliasah ;) – Ramesh Maharjan Jan 26 '18 at 10:03
@Bomin then you would need to go with Window function to give ranks and filter the top three in each group. :) – Ramesh Maharjan Jan 26 '18 at 10:05

How to get the top n values from dataframe

0 Answers0