I am running a tensorflow training on a Linux machine with 4 cores. When checking the cpu utilization with htop, only one core is fully utilized, whereas the others are utilized only with ~15% (image below shows a screenshot of htop).
How can I make sure TF is using all CPUs to full capacity?
I am aware of this issue "Using multiple CPU cores in TensorFlow" - how to make it work for Tensoflow 2?
I am using the following code to generate the samples:
class WindowGenerator():
def make_dataset(self, data, stride=1):
data = np.array(data, dtype=np.float32)
ds = tf.keras.preprocessing.timeseries_dataset_from_array(
data=data,
targets=None,
sequence_length=self.total_window_size,
sequence_stride=stride,
shuffle=False,
batch_size=self.batch_size,)
ds = ds.map(self.split_window)
return ds
@property
def train(self):
return self.make_dataset(self.train_df)
@property
def val(self):
return self.make_dataset(self.val_df)
@property
def test(self):
return self.make_dataset(self.test_df, stride=24)
I'm using the following code to run the model training. sampleMgmt is of Class WindowGenerator. early_stopping defines the training termination criteria.
history = model.fit(sampleMgmt.train, epochs=self.nrEpochs,
validation_data=sampleMgmt.val,
callbacks=[early_stopping],
verbose=1)