Want to Convert pandas df code to pyspark df code?

Question

pandas df code is

Data = data[data['ObservationDate'] == max(data['ObservationDate'])].reset_index()
Data_world = Data.groupby(["ObservationDate"])["Confirmed","Active_case","Recovered","Deaths"].sum().reset_index()
Data_world

Dataframe structure is this.

SNo     ObservationDate     Province/State  Country/Region  Last Update     Confirmed   Deaths  Recovered   Active_case
0   1   01/22/2020  Anhui   China   1/22/2020 17:00     1   0   0   1
1   2   01/22/2020  Beijing     China   1/22/2020 17:00     14  0   0   14
2   3   01/22/2020  Chongqing   China   1/22/2020 17:00     6   0   0   6
3   4   01/22/2020  Fujian  China   1/22/2020 17:00     1   0   0   1
4   5   01/22/2020  Gansu   China   1/22/2020 17:00     0   0   0   0

and want output like this

ObservationDate     Confirmed   Active_case     Recovered   Deaths
0   03/22/2020  335957  223441  97882   14634

How to filter on max date?

max_date =  df.select(max("ObservationDate")).first()
group_data = df.groupBy("ObservationDate")
group_data.agg({'Confirmed':'sum', 'Deaths':'sum', 'Recovered':'sum', 'Active_case':'sum'}).show()

What have you tried so far, and what went wrong with your attempts? — G. Anderson, Mar 24 '20 at 15:10
group_data = df.groupBy("ObservationDate") group_data.agg({'Confirmed':'sum', 'Deaths':'sum', 'Recovered':'sum', 'Active_case':'sum'}).show() — Usman Kiani, Mar 24 '20 at 15:29
better start from the business case rather than the code ... — Steven, Mar 24 '20 at 15:33

score 1 · Accepted Answer · answered Mar 24 '20 at 16:12

I think this is what you want. You can collect your max date first, then use it in a filter before you groupBy and aggregate.

from pyspark.sql import functions as F
max_date=df.select(F.max("ObservationDate")).collect()[0][0]
df.filter(F.col("ObservationDate")==max_date)\
.groupBy("ObservationDate")\
.agg({'Confirmed':'sum', 'Deaths':'sum', 'Recovered':'sum', 'Active_case':'sum'})\
.show()

Want to Convert pandas df code to pyspark df code?

1 Answers1