I am converting sql code into Pyspark.
The sql code is using rollup to sum up the count for each state.
I am try to do the same thing in pyspark, but don't know how to get the total count row.
I have a table with state, city, and count, I want to add a total count for each state at the end of the state sections.
This is a sample input:
State City Count
WA Seattle 10
WA Tacoma 11
MA Boston 11
MA Cambridge 3
MA Quincy 5
This is my desired output:
State City Count
WA Seattle 10
WA Tacoma 11
WA Total 21
MA Boston 11
MA Cambridge 3
MA Quincy 5
MA Total 19
I don't know how to add the total count in between states.
I did try rollup, here is my code:
df2=df.rollup('STATE').count()
and the result show up like this:
State Count
WA 21
MA 19
But I want the Total after each state.