How to properly quantize CNN into 4-bit using Tensorflow QAT?

Question

I am trying to make 4-bit quantization and used this example First of all I received the following warnings:

WARNING:tensorflow:AutoGraph could not transform <bound method Default8BitQuantizeConfig.set_quantize_activations of <tensorflow_model_optimization.python.core.quantization.keras.default_8bit.default_8bit_quantize_registry.Default8BitQuantizeConfig object at 0x7fb0208015c0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: expected an indented block (<unknown>, line 14)
WARNING: AutoGraph could not transform <bound method Default8BitQuantizeConfig.set_quantize_activations of <tensorflow_model_optimization.python.core.quantization.keras.default_8bit.default_8bit_quantize_registry.Default8BitQuantizeConfig object at 0x7fb020806550>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: expected an indented block (<unknown>, line 14)

Than after reading this doc I found that it is possible to quantize my network into 4 bit but I couldn't understand is it possible for only Dense layer or for all (like Conv2D)?

I also don't understand how to work with weights since numpy can work only with float32.

UPD: I finally figure out how to perform quantization aware training

LastValueQuantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer
MovingAverageQuantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer

class DefaultDenseQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
    # Configure how to quantize weights.
    def get_weights_and_quantizers(self, layer):
      return [(layer.kernel, LastValueQuantizer(num_bits=4, symmetric=True, narrow_range=False, per_axis=False))]

    # Configure how to quantize activations.
    def get_activations_and_quantizers(self, layer):
      return [(layer.activation, MovingAverageQuantizer(num_bits=4, symmetric=False, narrow_range=False, per_axis=False))]

    def set_quantize_weights(self, layer, quantize_weights):
      # Add this line for each item returned in `get_weights_and_quantizers`
      # , in the same order
      layer.kernel = quantize_weights[0]

    def set_quantize_activations(self, layer, quantize_activations):
      # Add this line for each item returned in `get_activations_and_quantizers`
      # , in the same order.
      layer.activation = quantize_activations[0]

    # Configure how to quantize outputs (may be equivalent to activations).
    def get_output_quantizers(self, layer):
      return []

    def get_config(self):
      return {}

QAT_model = tfmot.quantization.keras.quantize_annotate_model( keras.Sequential([
    tfmot.quantization.keras.quantize_annotate_layer( tf.keras.layers.Dense(2, activation='relu', input_shape= x_train.shape[1:]), DefaultDenseQuantizeConfig() ),
    tfmot.quantization.keras.quantize_annotate_layer( tf.keras.layers.Dense(2, activation='relu'), DefaultDenseQuantizeConfig() ),
    tfmot.quantization.keras.quantize_annotate_layer( tf.keras.layers.Dense(10, activation='softmax'), DefaultDenseQuantizeConfig() )
]) )

with tfmot.quantization.keras.quantize_scope(
  {'DefaultDenseQuantizeConfig': DefaultDenseQuantizeConfig}):
  # Use `quantize_apply` to actually make the model quantization aware.
  quantized_model = tfmot.quantization.keras.quantize_apply(QAT_model)

quantized_model.summary()

quantized_model.compile(optimizer='adam',  # Good default optimizer to start with
              loss='sparse_categorical_crossentropy',  # how will we calculate our "error." Neural network aims to minimize loss.
              metrics=['accuracy'])  # what to track

quantized_model.fit(x_train, y_train, epochs=3)

val_loss, val_acc = quantized_model.evaluate(x_test, y_test)

But I still can't understand how to access the 4-bit quantized weights. I used np.array( quantized_model.get_weights() ) but of course it gave me float32 moreover the number of elements in the quantized array is less than in original model. How this can be explained?

I also have the same problem of accessing quantized weights or getting quantized aware float32 weights correctly quantize them. — gihan, Oct 23 '20 at 00:03

How to properly quantize CNN into 4-bit using Tensorflow QAT?

0 Answers0