Intel's "Optimization Reference Manual" mentions a new cpu feature "Fast Short REP CMPSB and SCASB" that could speed up string operations:
REP CMPSB and SCASB performance is enhanced. The enhancement applies to string lengths between 1 and 128 bytes long. When the Fast Short REP CMPSB and SCASB feature is enabled, REP CMPSB and REP SCASB performance is flat 15 cycles per operation, for all strings 1-128 byte long whose two source operands reside in the processor first level cache.
Support for fast short REP CMPSB and SCASB is enumerated by the CPUID feature flag: CPUID.07H.01H:EAX.FAST_SHORT_REP_CMPSB_SCASB[bit 12] = 1.
Fast Short REP MOVSB explicitly mentions support
Beginning with processors based on Ice Lake Client microarchitecture, REP MOVSB performance of short operations is enhanced
But I could not find any information about which cpu generation started supporting "Fast Short REP CMPSB".