I read about checkpoint and it looks great for my needs but I couldn't find a good example of how to use it.
My questions are:
Should I specifiy the checkpoint dir? Is it possible to do it like this:
df.checkpoint()
Are there any optional params that I should be aware about?
Is there a default checkpoint dir or I must specify one as default?
When I checkpoint dataframe and I reuse it - It autmoatically read the data from the dir that we wrote the files?
It will be great if you can share with me example of using checkpoint in pyspark with some explanation. Thanks!