I'm looking for a code which will cause perf degradation when moving to newer cpu. I know this is theoretically possible, but I'm having hard time finding example which would work.
Some constraints:
It should be single threaded
It should be compiled for either i386 or oldset x86_64 or be handwritten assembly
If compiled it should statically link against all libraries so that libc can't load optimised versions of libraries at runtime
Clock cycles can be approximated as time of execution / max frequency. Or some perf tool can be used. This is in order to avoid some RISC code which would run blazingly fast on 4GHz pentium 4s.
My current idea is to overload instruction issue buffer with branches, but have no idea how to implement that effectively. Other approaches are welcome. The more ways to sink perf, the better.