I have query that does a moving average over the beginning of time on data found in mysql db. Then I need to execute that query every day to use the previous day's value.
Instead of querying my database everytime I am using checkpoint to store the latest date computed so far. Then I am restoring the checkpoint to get the dataframe but I am getting all the data I used before including the latest date stored in a dataframe.
I just need a method to not have to re-execute my query on the whole mysql db and instead use the latest date's input or is that doable and recommended in spark.
df.checkpoint
RecoverCheckpoint.recover
I do not know if that is a good method since checkpoint is used for fault tolerance. Is there another way to achieve this?
Ref: