2

I'm using TensorFlow's quantization aware training API and wish to deploy a model with arbitrary bit-width. As only 8 bit quantization is supported for tflite deployment I will deploy with a custom inference algorithm, but I still need to access the weights of the model in the correct size.

Currently after using quantization aware training my model is still in floating point, and as far as I've seen the only way to access the quantized weights is to convert the model to tflite format. However, this is impossible when using experimental functions.

Here is my quantize config class:

    class Quantizer(tfmot.quantization.keras.QuantizeConfig):
    # Configure how to quantize weights.
    def get_weights_and_quantizers(self, layer):
        return [(layer.kernel, tfmot.quantization.keras.quantizers.LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False))]

    # Configure how to quantize activations.
    def get_activations_and_quantizers(self, layer):
        return [(layer.activation, tfmot.quantization.keras.quantizers.MovingAverageQuantizer(num_bits=8, symmetric=False, narrow_range=False, per_axis=False))]

    def set_quantize_weights(self, layer, quantize_weights):
        # Add this line for each item returned in `get_weights_and_quantizers`
        # , in the same order
            layer.kernel = quantize_weights[0]

    def set_quantize_activations(self, layer, quantize_activations):
        # Add this line for each item returned in `get_activations_and_quantizers`
        # , in the same order.
        layer.activation = quantize_activations[0]

    # Configure how to quantize outputs (may be equivalent to activations).
    def get_output_quantizers(self, layer):
        return []

    def get_config(self):
        return {}
    
class ModifiedQuantizer(Quantizer):
    # Configure weights to quantize with 4-bit instead of 8-bits.
    def get_weights_and_quantizers(self, layer):
        return [(layer.kernel, quantizer(num_bits=bits, symmetric=symmetric, narrow_range=narrow_range, per_axis=per_axis))]

And here is how I quantize the model:

    supported_layers = [
    tf.keras.layers.Conv2D,
    tf.keras.layers.DepthwiseConv2D
]

class Quantizer(tfmot.quantization.keras.QuantizeConfig):
    # Configure how to quantize weights.
    def get_weights_and_quantizers(self, layer):
        return [(layer.kernel, tfmot.quantization.keras.quantizers.LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False))]

    # Configure how to quantize activations.
    def get_activations_and_quantizers(self, layer):
        return [(layer.activation, tfmot.quantization.keras.quantizers.MovingAverageQuantizer(num_bits=8, symmetric=False, narrow_range=False, per_axis=False))]

    def set_quantize_weights(self, layer, quantize_weights):
        # Add this line for each item returned in `get_weights_and_quantizers`
        # , in the same order
            layer.kernel = quantize_weights[0]

    def set_quantize_activations(self, layer, quantize_activations):
        # Add this line for each item returned in `get_activations_and_quantizers`
        # , in the same order.
        layer.activation = quantize_activations[0]

    # Configure how to quantize outputs (may be equivalent to activations).
    def get_output_quantizers(self, layer):
        return []

    def get_config(self):
        return {}
    
class ModifiedQuantizer(Quantizer):
    # Configure weights to quantize with 4-bit instead of 8-bits.
    def get_weights_and_quantizers(self, layer):
        return [(layer.kernel, quantizer(num_bits=bits, symmetric=symmetric, narrow_range=narrow_range, per_axis=per_axis))]
    
    # Configure how to quantize activations.
    def get_activations_and_quantizers(self, layer):
        return [(layer.activation, tfmot.quantization.keras.quantizers.MovingAverageQuantizer(num_bits=bits, symmetric=False, narrow_range=False, per_axis=False))]

    def quantize_all_layers(layer):
        for supported_layer in supported_layers:
            if isinstance(layer, supported_layer):
                return quantize_annotate_layer(layer, quantize_config=ModifiedQuantizer())
        # print(layer.name)
        return layer
    annotated_model = clone_model(
        model,
        clone_function=quantize_all_layers
    )

with quantize_scope(
    {'Quantizer': Quantizer},
    {'ModifiedQuantizer': ModifiedQuantizer},
    {'_relu6': models._relu6}):
    q_aware_model = quantize_apply(annotated_model)

optimizer = keras.optimizers.Adam(
    learning_rate=0.001)
q_aware_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True),
    optimizer=optimizer, metrics=['sparse_categorical_accuracy'])

train_images, train_labels, val_images, val_labels, _, _ = cifar10.load()

q_aware_model.fit(train_images, train_labels, batch_size=64, epochs=1, verbose=1,
                  validation_data=(val_images, val_labels))

Is previously said, when using e.g. bits=4 in the ModifiedQuantizer, the model is still saved in floating point, and I don't know how to access the quantized weights.

Thanks!

1 Answers1

0

I suspect you could get the quantized weights by invoking LastValueQuantizer.__call__ on a given layer's weight tensor. How to invoke that method is the question.

The current signature is:

    LastValueQuantizer.__call__(inputs, training, weights, **kwargs)

I assume that inputs is the layer's weights and weights is the value returned by LastValueQuantizer.build. If you could get a reference to the weights returned by build, I would hope it would be straightforward to quantize the layer's weights directly using LastValueQuantizer.__call__.

[nav] In [1]: from tensorflow_model_optimization.quantization.keras.quantizers import LastValueQuantizer
INFO:tensorflow:Enabling eager execution
INFO:tensorflow:Enabling v2 tensorshape
INFO:tensorflow:Enabling resource variables
INFO:tensorflow:Enabling tensor equality
INFO:tensorflow:Enabling control flow v2

[nav] In [2]: q = LastValueQuantizer(num_bits=3, per_axis=True, symmetric=True, narrow_range=True)

[ins] In [3]: ??q.__call__
Signature: q.__call__(inputs, training, weights, **kwargs)
Source:   
  def __call__(self, inputs, training, weights, **kwargs):
    """Quantize tensor.

    Args:
      inputs: Input tensor to be quantized.
      training: Whether the graph is currently training.
      weights: Dictionary of weights the quantizer can use to quantize the
        tensor. This contains the weights created in the `build` function.
      **kwargs: Additional variables which may be passed to the quantizer.

    Returns:
      Quantized tensor.
    """
    return quant_ops.LastValueQuantize(
        inputs,
        weights['min_var'],
        weights['max_var'],
        is_training=training,
        num_bits=self.num_bits,
        per_channel=self.per_axis,
        symmetric=self.symmetric,
        narrow_range=self.narrow_range
    )
Dharman
  • 30,962
  • 25
  • 85
  • 135
ndronen
  • 982
  • 8
  • 12
  • Thanks. I will look into this. Currently I have quantized each layer by calculating scale and zero point for each weight tensor. However as far as I know, TensorFlow never specifies exactly their implementation of obtaining the quantization parameters, so it's quite a makeshift solution. – LucasStromberg Mar 31 '21 at 07:48