0

I have the following code

ageDF.sort('Period')
     .groupBy('Period')
     .agg(round(sum('Age_specific_birth_rate'), 2).alias('Total Births'))
     .show()

The above groups the sum of age_specific_birth_rate by Period

So the output will be like

Period|Total Births|
+------+------------+
|  2000|       395.5|
|  2001|       393.4|
|  2002|       377.3|
|  2003|       386.2|
|  2004|       395.9|
|  2005|       391.9|
|  2006|       400.4|
|  2007|       434.0|
|  2008|       437.8|
|  2009|       425.7|
|  2010|       434.0|
|  2011|       417.8|
|  2012|       418.2|
|  2013|       400.4|
|  2014|       384.3|
|  2015|       398.7|
|  2016|       374.8|
|  2017|       362.7|
|  2018|       342.2|

But I wanna display the maximum among this by Period

so when I type in the follwing code

  ageDF.sort('Period')
       .groupBy('Period')
       .agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
       .select('Period', 'Total')
       .agg(max('Total'))
       .show()

I get the output

> +----------+
  |max(Total)|
  +----------+
  |     437.8| 
  +----------+

But I wanna get something like

  +------+------------+
  |Period|max(Total)  |
  +------+------------+
  |  2008|       395.5|

What shuold I do ?

Thank you

howie
  • 2,587
  • 3
  • 27
  • 43
Rudy
  • 19
  • 2
  • 1
  • 1
    That's a common problem. You want to output max value and the line that contains max value. Alternative way is for-loop your data and compare each one with max value, if they are equal then output this. There is probably multi answer. – MoreFreeze Mar 08 '19 at 06:23
  • Can you put a small initial dataset as an example and the output expected for that dataset to be able to reproduce and understand the case? – Daniel Sobrado Mar 08 '19 at 06:25
  • Possible duplicate of [GroupBy column and filter rows with maximum value in Pyspark](https://stackoverflow.com/questions/48829993/groupby-column-and-filter-rows-with-maximum-value-in-pyspark) – pault Mar 08 '19 at 14:51

1 Answers1

0

You can try

ageDF.sort('Period')
     .groupBy('Period')
     .agg(round(sum('Age_specific_birth_rate'), 2).alias('Total'))
     .orderBy(functions.col('Total').desc())
     .limit(1)
     .select('Period', 'Total')
     .show()    
howie
  • 2,587
  • 3
  • 27
  • 43