Accuracy Degradation After Quantization

Why does accuracy degrade?

In general, when quantizing models to lower precision, the distribution of per layer's activation is inevitably changed, this shift in distribution accumulates through layers thus resulting in accuracy degradation.

Illustrating Quantization Error

Here we show a toy example for quantization error using the asymmetric int8

quantization setting.

import numpy as np
x = np.array((-51.35, 165.52, 0.4, -0.4, 0.5, 1.2))
s = (165.52 + 51.35) / 255
z = -np.round(-51.35 / s)
q_x = np.clip(np.round(x/s) + z, 0, 255)
q_dq_x = s * (q_x - z)
np.set_printoptions(precision=2)
print("original_x: ", x)
print("quantized_x: ", q_x)
print("quantized_dequantized_x: ", q_dq_x)

After executing the code, the printed variables will be:

x: [-51.35, 165.52, 0.4, -0.4, 0.5, 1.2]
q_x: [ 0., 255., 60., 60., 61., 61.]
q_dq_x: [-51.03, 165.84, 0., 0., 0.85, 0.85]

This demonstrates how quantization error is produced and causes some entries in x become indiscriminative.

[0.4, -0.4] is equivalent to [0., 0.]

[0.5, 1.2] is equivalent to [0.85, 0.85]

Distribution of Weights Matters

From the above section, it is easy to conclude that values with characteristics such as being distributed within a small range or being evenly distributed will preserve higher fidelity when represented in int8 precision.

Let's examine the convolutional layers in the RTMDet-s.onnx model, which we exported using the provided script and model checkpoint from OpenMMLab's GitHub repository.

The plot below displays the weights of a specific convolutional layer in the RTMDet-s model. This layer can be found by searching for the node name /backbone/stage1/stage1.1/blocks/blocks.0/conv2/depthwise_conv/conv/Conv.

Each boxplot provides an overview of the weight values for a convolutional kernel. Clearly, the weight values in kernel indices 11 and 19 have significant outliers. Given how quantization works, these outliers are likely to severely damage the functionality of this convolutional layer under a per-tensor quantization scheme.

Weights Comparison

To determine whether significant outliers are common in object detection models, the following compares the binned distribution of weight ranges(max(weights) - min(weights)) for FasterRCNN, YOLOv10-N, and RTMDet-s, with each convolutional layer contributing a single count.

Comparing YOLOv10-N with RTMDet-s, the maximum weight range for YOLOv10-N is around 20, while RTMDet-s has weight range up to around 200.

After quantizing both models to int8 precision, YOLOv10-N maintained a certain level of accuracy. However, RTMDet-s has become unusable, producing nonsensical results.

Comparing FasterRCNN and YOLOv10-N, FasterRCNN has more convolutional layers with smaller weight ranges. One reason for this could be that YOLOv10-Nis anchor-free, whereas FasterRCNN is anchor-based. It is intuitive that a model might learn larger weights in order to produce larger values.

Despite both YOLOv10-N and RTMDet-s being anchor-free models, YOLOv10-N has a better weight range distribution. This suggests that it is possible to train an anchor-free object detection model with relatively smaller weights.