I want to group a dataframe on a single column and then apply an aggregate function on all columns.
For example, I have a df with 10 columns. I wish to group on the first column "1" and then apply an aggregate function 'sum' on all the remaining columns, (which are all numerical).
The R equivalent of this is summarise_all. Ex in R.
df = df%>%group_by(column_one)%>%summarise_all(funs(sum))
I do not want to manually enter the columns in the aggregate command in pyspark, as the number of columns in the dataframe will be dynamic.