Completely guaranteed consistent RDTSC counts are impractical. e.g. you'd have to disable interrupts as well as the usual stuff like disable turbo so the CPU runs at constant speed after leaving idle.
(Note that RDTSC on modern CPUs counts reference cycles, not actual core clock cycles. Get CPU cycle count?)
And you'd have to warm up caches + branch prediction + everything and get the CPU to max clock speed before the first timed test. If you're timing tests separately, timing the first one as the "cold" state is actually useful.
In practice people don't disable interrupts, and just ignore high outliers on the assumption that an interrupt or something happened during that test run. You can't disable SMM or NMI anyway.