Can we use a different CPU architecture(and backend) for training(calibration) and inference of the quantized pytorch model?
The only post on this subject that I've found states:
static quantization must be performed on a machine with the same architecture as your deployment target. If you are using FBGEMM, you must perform the calibration pass on an x86 CPU; if you are using QNNPACK, calibration needs to happen on an ARM CPU
But there is nothing about this in the official documentation.