I adopted online to measure SSE performance.
#ifndef __TIMER_H__
#define __TIMER_H__
#pragma warning (push)
#pragma warning (disable : 4035) // disable no return value warning
__forceinline unsigned int GetPentiumTimer()
{
__asm
{
xor eax,eax // VC won't realize that eax is modified w/out this
// instruction to modify the val.
// Problem shows up in release mode builds
_emit 0x0F // Pentium high-freq counter to edx;eax
_emit 0x31 // only care about low 32 bits in eax
xor edx,edx // so VC gets that edx is modified
}
}
#pragma warning (pop)
#endif
I did the measurement on my Pentium D E2200 CPU, and it works fine (it shows aligned SSE instructions are faster). But on my i3 CPU I get unaligned instructions faster 70% of the tests.
Do you guys think this clock tick measurement is not suitable for i3 CPU?