Why can't I see the local epochs output when training tensorflow federated learning model?

Question

I am training a tensorflow federated learning model. I cannot see the output of epochs. Details are as follows:

split = 4
NUM_ROUNDS = 5
NUM_EPOCHS = 10
BATCH_SIZE = 2
PREFETCH_BUFFER = 5


for round_num in range(1, NUM_ROUNDS+1):
    state, tff_metrics = iterative_process.next(state, federated_train_data) 
    print('round {:2d}, metrics{}'.format(round_num,tff_metrics['train'].items()))
    
    eval_model = create_keras_model()
    eval_model.compile(optimizer=optimizers.Adam(learning_rate=client_lr),
                       loss=losses.BinaryCrossentropy(),
                       metrics=[tf.keras.metrics.Accuracy()])
    
    #tff.learning.assign_weights_to_keras_model(eval_model, state.model)
    state.model.assign_weights_to(eval_model)
    
    ev_result = eval_model.evaluate(x_val, y_val, verbose=2)
    train_metrics = tff_metrics['train']
      for name, value in tff_metrics['train'].items():
            tf.summary.scalar(name,value, step=round_num)
    
    tff_val_acc.append(ev_result[1])
    tff_val_loss.append(ev_result[0])

And my output looks as follows:


    round  1, metrics=odict_items([('accuracy', 0.0), ('loss', 1.2104079)])
    1/1 - 1s - loss: 0.7230 - accuracy: 0.0000e+00 - 1s/epoch - 1s/step
    round  2, metrics=odict_items([('accuracy', 0.0007142857), ('loss', 1.2233553)])
    1/1 - 1s - loss: 0.6764 - accuracy: 0.0000e+00  - 646ms/epoch - 646ms/step
    round  3, metrics=odict_items([('accuracy', 0.0),  ('loss', 1.1939998)])
    1/1 - 1s - loss: 0.6831 - accuracy: 0.0000e+00  - 635ms/epoch - 635ms/step
    round  4, metrics=odict_items([('accuracy', 0.0), ('loss', 1.2829995)])
    1/1 - 1s - loss: 0.6830 - accuracy: 0.0000e+00  - 641ms/epoch - 641ms/step
    round  5, metrics=odict_items([('accuracy', 0.0),  ('loss', 1.2051892)])
    1/1 - 1s - loss: 0.7135 - accuracy: 0.0000e+00 - 621ms/epoch - 621ms/step

Are these values for global model after each round? How can I plot the curves for validation accuracy of the global model for the 100 epochs (10 rounds, 10 local epochs per round)? (Not in tensorboard)

score 2 · Accepted Answer · answered Jul 14 '22 at 13:25

Why can't I see the local epochs output when training tensorflow federated learning model?

Generally in federated learning the client is performing local computation not visible to the server. In this case, the server (or us modelers) only see the the result of that local training (not the individual epochs).

Are these values for global model after each round?

Yes, the logging is statements are a mix of both training and validation metrics of the global model after each round. Note that the training metrics in federated learning have a subtle peculiarity.

round  1, metrics=odict_items([('accuracy', 0.0), ('loss', 1.2104079)])

these lines are the training metrics, and are being produced by the code:

print('round {:2d}, metrics{}'.format(round_num,tff_metrics['train'].items()))

The validation metrics are being printed by Keras, these logging statements:

 1/1 - 1s - loss: 0.7230 - accuracy: 0.0000e+00 - 1s/epoch - 1s/step

are being printed by this line:

ev_result = eval_model.evaluate(x_val, y_val, verbose=2)

How can I plot the curves for validation accuracy of the global model for the 100 epochs (10 rounds, 10 local epochs per round)?

The tff_val_acc and tff_val_loss lists should have the validation metric values, indexed by round number. Using a library such as matplotlib (https://matplotlib.org/) could be an option for plotting these curves?

Thank you so much @Zachary for the detailed explanation. I have another question: When trained with the `build_weighted_fed_avg_with_optimizer_schedule` algorithm training accuracy stay constant at around 0.5. I tried with several different learning rates ranging from 0.01 to 0.001. This behavior is not observed with other algorithms like fedprox and fedavg. Could you please tell me what could be the issue? — Dushi Fdz, Jul 15 '22 at 04:49
could you please have a look at this question please? https://stackoverflow.com/questions/73024656/in-fedavg-what-is-the-client-optimizer — Dushi Fdz, Jul 18 '22 at 15:16

Why can't I see the local epochs output when training tensorflow federated learning model?

1 Answers1