I'm running a Core i7 3930k, which is of the Sandy Bridge microarchitecture. When executing the following code (compiled under MSVC19, VS2015), the results surprised me (see in comments):
int wmain(int argc, wchar_t* argv[])
{
uint64_t r = 0b1110'0000'0000'0000ULL;
uint64_t tzcnt = _tzcnt_u64(r);
cout << tzcnt << endl; // prints 13
int info[4]{};
__cpuidex(info, 7, 0);
int ebx = info[1];
cout << bitset<32>(ebx) << endl; // prints 32 zeros (including the bmi1 bit)
return 0;
}
Disassembly shows that the tzcnt
instruction is indeed emitted from the intrinsic:
uint64_t r = 0b1110'0000'0000'0000ULL;
00007FF64B44877F 48 C7 45 08 00 E0 00 00 mov qword ptr [r],0E000h
uint64_t tzcnt = _tzcnt_u64(r);
00007FF64B448787 F3 48 0F BC 45 08 tzcnt rax,qword ptr [r]
00007FF64B44878D 48 89 45 28 mov qword ptr [tzcnt],rax
How come I'm not getting an #UD
invalid opcode exception, the instruction functions correctly, and the CPU reports that it does not support the aforementioned instruction?
Could this be some weird microcode revision that contains an implementation for the instruction but doesn't report support for it (and others included in bmi1
)?
I haven't checked the rest of the bmi1
instructions, but I'm wondering how common a phenomenon this is.