pyspark find date with the highest amt.

Question

I have the following df:

time loc amt
2012-07-20 01:00:00 A 3300
2012-01-04 17:29:00 B 300
2012-07-20 01:00:00 A 200
2012-01-04 17:29:00 B 500
2012-01-04 17:29:00 C 333

I would like to output the date that had the highest amt.

output: 2012-07-20

How do I do this using pyspark?

score 0 · Answer 1 · answered Apr 30 '16 at 21:52

0

How about just collect the max value first and then filter?

max_amt = df.select(max(df.amt)).collect()[0][0]
df.filter(df.amt == lit(max_amt)).select(to_date('time')).show()
+------------+
|todate(time)|
+------------+
|  2012-07-20|
+------------+

answered Apr 30 '16 at 21:52

Anthony

1,513
11
17

pyspark find date with the highest amt.

1 Answers1