0

The following papers have done creative work in distributed machine learning. I want to know how to calculate communication cost and workload balance ratio.

[1] J. Chen et al. A Bi-layered Parallel Training Architecture for Large-scale Convolutional Neural Networks. IEEE Transactions on Parallel and Distributed Systems. 2018.

This paper give Data Communication and Workload Balancing Analysis in Section 5.4, and show the comparison on data communication and workload balancing in Fig. 15. I do not know which parallel framework the paper use, MPI or Spark?

[2]J. Chen, et al. A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment. IEEE Transactions on Parallel and Distributed Systems, 2017.

This paper give Resource and Workload Balance Analysis in Section 4.3.4, but do not provide the quantitative Workload Balance on the specific data-set. The paper present Data Communication Analysis Analysis in Section 4.3.3, and also provide Data communication costs in Fig.15.

My questions are as follows:

(1) how to compute data communication cost and workload balance ratio in spark cluster?

(2) how to compute data communication cost and workload balance ratio in MPI cluster?

(3) which parallel framework does the paper "A Bi-layered Parallel Training Architecture for Large-scale Convolutional Neural Networks" use?

  • Maybe the answer for MPI. Tools to measure MPI communication costs [link](https://stackoverflow.com/questions/10607750/tools-to-measure-mpi-communication-costs?rq=1) – Dajiang Lei Feb 21 '19 at 13:35
  • 1
    These questions, especially (3), are best addressed to the authors of the respective papers. – Zulan Feb 21 '19 at 13:49
  • For (1) , I want to know the detail implementation, which API or property in configuration file. – Dajiang Lei Feb 22 '19 at 01:33
  • Have a look at score-p. It is a quite good solution to monitor parallel codes... – David Daverio Feb 23 '19 at 14:45

0 Answers0