I have a dynamic dataset like below which is updating everyday. Like on Jan 11 data is:
Name | Id |
---|---|
John | 35 |
Marrie | 27 |
On Jan 12, data is
Name | Id |
---|---|
John | 35 |
Marrie | 27 |
MARTIN | 42 |
I need to take count of the records and then append that to a separate dataset. Like on Jan 11 my o/p dataset is
Count | Date |
---|---|
2 | 11-01-2023 |
On Jan 12 my o/p dataset should be
Count | Date |
---|---|
2 | 11-01-2023 |
3 | 12-01-2023 |
and so on for all other days whenever the code is ran.
This has to be done using Pyspark
I tried using the semantic_version in the incremental function but it is not giving the desired result.