What I'd like to know is if the following is permissible using pyspark: Assume the following df:
|model | year | price | mileage |
+++++++++++++++++++++++++++++++++++++++++
|Galaxy | 2017 | 27841 |17529 |
|Galaxy | 2017 | 29395 |11892 |
|Novato | 2018 | 35644 |22876 |
|Novato | 2018 | 8765 |54817 |
df.groupBy('model', 'year')\
.agg({'price':'sum'})\
.agg({'mileage':sum'})\
.withColumnRenamed('sum(price)', 'total_prices')\
.withColumnRenamed('sum(mileage)', 'total_miles')
Hopefully resulting in
|model | year | price | mileage | total_prices| total_miles|
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|Galaxy | 2017 | 27841 |17529 | 57236 | 29421 |
|Galaxy | 2017 | 29395 |11892 | 57236 | 29421 |
|Novato | 2018 | 35644 |22876 | 44409 | 77693 |
|Novato | 2018 | 8765 |54817 | 44409 | 77693 |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++