-1

I have the following df:

time loc amt
2012-07-20 01:00:00 A 3300
2012-01-04 17:29:00 B 300
2012-07-20 01:00:00 A 200
2012-01-04 17:29:00 B 500
2012-01-04 17:29:00 C 333

I would like to output the date that had the highest amt.

output: 2012-07-20

How do I do this using pyspark?

j.doe1
  • 11
  • 4

1 Answers1

0

How about just collect the max value first and then filter?

max_amt = df.select(max(df.amt)).collect()[0][0]
df.filter(df.amt == lit(max_amt)).select(to_date('time')).show()
+------------+
|todate(time)|
+------------+
|  2012-07-20|
+------------+
Anthony
  • 1,513
  • 11
  • 17