In federated averaging, does the client optimizer have to be 'SGD' only?
In this paper ADAPTIVE FEDERATED OPTIMIZATION it states "One such method is FEDAVG (McMahan et al., 2017), in which clients perform multiple epochs of SGD on their local datasets." Based on this statement, if the clients run Adam on their own loss function, it is not federated averaging?
What is the difference between 'federated averaging (FedAvg)' and 'Adaptive federated optimization (FedOpt) (paper linked above)'?
In other words what is the different between tff.learning.algorithms.build_weighted_fed_avg
and tff.learning.algorithms.build_weighted_fed_avg_with_optimizer_schedule
?