How determinate number of rounds in TFF context

Question

In TFF, It is necessary to determinate number of rounds. So, to obtain optimal performance of our model, How we can know the optimal number of rounds?

Keith Rush · Answer 1 · 2020-03-09T04:37:16.530

TFF does not necessarily need you to specify the number of rounds for federated training beforehand. TFF is more about specifying the federated aspect of your computation (which you can essentially think of as specifying the communication), and considers actually "running" the rounds to be at the system level.

When you write TFF, generally you are writing at three levels (explanation of this statement here); the question you are asking (and every concern TFF considers a "system concern") is at the Python level. Since Python controls the actual invocation of your computation written in TFF, you can stop training with any criterion expressible in Python. E.g. if you want to monitor performance on a validation set and use that as a stopping criteria, this is entirely doable. If you have a tff.utils.IterativeProcess ip, and evaluation function eval_fn (see here for an example), this could be implemented as something like:

while True:
  data = sample_client_data()
  state, metrics = ip.next(state, data)
  eval_metrics = eval_fn(state)
  if condition(eval_metrics):
    break

Abstractly: since the Python drives the experiment process, you can stop whenever you want to, based on any observable characteristic of the training procedure. Therefore you do not in fact need to know how many rounds you will be running beforehand.

A more direct answer to the original question is, I think at this point in the history of FL, not quite achievable for the general case; nobody (as far as I am aware) knows of reliable system-level settings for FL at this point. This is not surprising; it is somewhat akin to knowing beforehand how many epochs one should specify in datacenter training, which I think tends to be quite problem-dependent. FL is similar in this regard. Practically speaking, my advice tends to be: monitor performance on a validation set, run for as long as you can, and keep the state of your highest-performing model on the val set around. I think a more general answer than this may be quite difficult.

Thank you very much, it was an important answer, to monitor performance on a validation set ( evaluation was invoked on the latest state we arrived at during training. In order to extract the latest trained model from the server state, you simply access the `.model` member, as follows.: `train_metrics = evaluation(state.model, federated_train_data)`. I understand from here that the latest state of model is provided after a predeterminated number_of_round.. what do you think about this — Ayness, Mar 12 '20 at 13:51

How determinate number of rounds in TFF context

1 Answers1