Neo is optimizing inference using compilation, which is different and often orthogonal to compression
compilation makes inference faster and lighter by specializing the prediction application, notably: (1) changing the environment in which the model runs, in particular replacing training frameworks by the least amount of necessary math libraries, (2) optimizing the model graph to be prediction-only and grouping together operators that can be, (3) specializing the runtime to use best the specific hardware and instructions available on a given target machine. Compilation is not supposed to change the model math, thereby doesn't change its footprint on disk
compression makes inference faster by removing model weights or making them smaller (quantization). Weights can be removed by pruning (dropping weights that do not influence much results or distillation (training a small model to mimic a big model).
At the time of this writing, SageMaker Neo is a managed compilation service. That being said, compilation and compression can be combined, and you can prune or distill your network before feeding it to Neo.
SageMaker Neo covers a large grid of hardware targets and model architectures, and consequently leverages numerous backends and optimizations. Neo internals are publicly documented in many places: