how to enable keras fit() multiprocessing properly?

Question

when I run fit() with multiprocessing=True i always get a deadlock and the following warning:

WARNING:tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.

how to run it properly?

Since it says "tf.data", i wonder if transforming my data into this format will make multiprocessing work. What specifically is meant/how to convert it?

my dataset: (reproducable)

Input_shape, labels =(20,4), 6
LEN_X.LEN_Y = 20000.3000 
train_X,train_Y = np.asarray([np.random.random(Input_shape) for x in range(LEN_X )]), np.random.random((LEN_X ,labels))
validation_X,validation_Y = np.asarray([np.random.random(Input_shape) for x in range(LEN_Y)]), np.random.random((LEN_Y,labels))
sampleW = np.random.random((LEN_X ,1))

tornikeo · Answer 1 · 2020-09-07T04:48:01.220

The multiprocessing doesn't accelerate the model itself. It only accelerates the data loading. And data loading delay is not a problem when all your data is already in-memory.

You could still use multiprocessing, however, but you must make sure that the underlying dataset is thread-safe and you have to carefully craft the data pipeline. That is quite time consuming. So, instead I suggest you speed up the model itself.

For that, you should look into:

changing all except last layer activations to RELU.
tweaking batch size. (optimal number depends on your hardware, and is almost always less than or equal to 32)
using Batch normalization to speed up convergence.
using higher learning rate (be careful not to overdo this step).
if you need faster convolutions, consider using Kaggle notebooks or vast.ai for GPU-enabled computations.
last but not least, try using a simpler, smaller model.

Comment down here if you have any additional questions.
Cheers.

I've noticed that for Deep-Mind's AlphaZero (https://arxiv.org/pdf/1712.01815.pdf), they used mini-batches of size 4096. Is that just them flexing their computer power on us, or is it an exception to the <=32 rule? — Arkleseisure, Sep 07 '20 at 14:45
@Arkleseisure that's a good question. I'm not sure but maybe they used the results of a [paper](https://arxiv.org/abs/1705.08741) that advocated a huge batch size in the very same year. Reinforcement learning is *very* hard to get right, we don't even know how to find correct batch sizes in general, yet. — tornikeo, Sep 07 '20 at 14:54

how to enable keras fit() multiprocessing properly?

1 Answers1