0

I have data like in a dataframe

CommsId Id Amount Date
85 1 10 07/10/2020
72 1 15 09/09/2021
85 1 25 09/09/2021
70 1 30 09/09/2021
72 1 -15 05/11/2020
70 1 -30 05/11/2020

For each date, I want to find the sum of amounts of the latest CommsId as of the date.

Expected output is as below

Date Sum_Amount Id
07/10/2020 10 1
09/09/2021 70 1
05/11/2021 25 1
sparc
  • 345
  • 1
  • 2
  • 13
  • 1
    Does this answer your question? [Spark Dataframe - How to keep only latest record for each group based on ID and Date?](https://stackoverflow.com/questions/59886143/spark-dataframe-how-to-keep-only-latest-record-for-each-group-based-on-id-and) – wwnde Feb 06 '22 at 23:47
  • Hi @wwnde, I have tried that, I want to do that for each Date. How do I do that? – sparc Feb 07 '22 at 05:26
  • step 1 - retain records with the latest `CommsId` for each group. step 2 - sum up the remaining records of `Amount` at the required group level. – samkart Feb 07 '22 at 15:08

0 Answers0