2

I have two laptops and want to use both for the DL model training. I don't have any experience in distributed systems and want to know is it possible to use the processing power of two laptops together to train a single model. What about tf.distribute.experimental.ParameterServerStrategy? Will it be of any use?

superduper
  • 401
  • 1
  • 5
  • 16
  • Check the docs here https://www.tensorflow.org/guide/distributed_training – ZWang Jul 23 '20 at 02:44
  • It doesn't say how to combine two machines to use the processing power of both. – superduper Jul 23 '20 at 18:47
  • 1
    I think Model Parallelism is what you are looking for. Please refer this talk on **Mesh Tensorflow**, https://www.youtube.com/watch?v=HgGyWS40g-g&list=PL6LsUGheZdT8te2nsOnFpzDQ2Am19VCYd&index=263. Thanks! –  Sep 06 '20 at 13:29

1 Answers1

1

Yes, you can use multiple devices for training your model and you need to have cluster and worker configuration to be done on both the devices like below.

tf_config = {
    'cluster': {
        'worker': ['localhost:12345', 'localhost:23456']
    },
    'task': {'type': 'worker', 'index': 0}
}

This Tutorial from Tensorflow on Multi-worker training with Keras will show you all the details about the configuration and training your model.

Hope this answers your question.