NOTE: To some extent, this was already asked here but my question tackles a different aspect of getting the best checkpoint.
In the referenced question, the author only desired to retrieve the best checkpoint from a set of checkpoints after the ray tune run. I want to ensure that only the best checkpoint is saved in the first place. So basically, I am looking for something like:
At this position, the ray checkpointing callback would be triggered. Check, if the current model state is better than the current "best checkpoint". If so, then delete the old "best checkpoint" and replace it by checkpointing the current model state. If not, don't trigger the checkpointing callback.
The reason for that is that I am testing hundreds of large models simultaneously and I have to save disk memory.