The following papers have done creative work in distributed machine learning. I want to know how to calculate communication cost and workload balance ratio.
[1] J. Chen et al. A Bi-layered Parallel Training Architecture for Large-scale Convolutional Neural Networks. IEEE Transactions on Parallel and Distributed Systems. 2018.
This paper give Data Communication and Workload Balancing Analysis in Section 5.4, and show the comparison on data communication and workload balancing in Fig. 15. I do not know which parallel framework the paper use, MPI or Spark?
[2]J. Chen, et al. A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment. IEEE Transactions on Parallel and Distributed Systems, 2017.
This paper give Resource and Workload Balance Analysis in Section 4.3.4, but do not provide the quantitative Workload Balance on the specific data-set. The paper present Data Communication Analysis Analysis in Section 4.3.3, and also provide Data communication costs in Fig.15.
My questions are as follows:
(1) how to compute data communication cost and workload balance ratio in spark cluster?
(2) how to compute data communication cost and workload balance ratio in MPI cluster?
(3) which parallel framework does the paper "A Bi-layered Parallel Training Architecture for Large-scale Convolutional Neural Networks" use?