1

In TensorFlow, detecting the number of available GPUs is a trivial task that can be resolved as described here.

However, when in the context of a MirroredStrategy, unless the user provides the information of how many GPUs are being used, it is not possible to my knowledge to actually get the number of active GPUs.

This leads to some uncanny behaviours, for instance, when defining a custom keras metric such as F1 Score by extending the abstract class Keras Metric: when training for instance with 4 GPUs, the metric scores are aggregated by sum, leading to metrics that are no longer defined from 0 to 1, but from 0 to 4.

This does not happen when using the default Keras Metrics, and I am at this time investigating their source code trying to understand how they avoid this: I expect for instance they have a method to get how many GPUs are being used and simply divide by this number.

Do you have any suggestions?

Luca Cappelletti
  • 2,485
  • 20
  • 35

1 Answers1

0

So, while I have not resolved how to obtain the count of used GPUs, I have resolved how to avoid the need for it altogether in my use case, which boils down to being an aggregation of weights.

To aggregate properly the weights, one can specify in the add_weight method one of the AggregationMethod.

In my use case, also, since I needed sum aggregation of the various components of a confusion metric, such as True Positives etc..., to let TensorFlow handle some of the more complex use cases, I ended up using the Sum metric class for composition, specifically one for each part of the confusion matrix.

Luca Cappelletti
  • 2,485
  • 20
  • 35