I'm a bit confused about both instructions. First let's discard the special case when the scanned value is 0 and the undefined/bsr or bitsize/lzcnt result - this difference is clear and not part of my question.
Let's take the binary value 0001 1111 1111 1111 1111 1111 1111 1111
According to Intel's spec the result for lzcnt
is 3
According to Intel's spec the result for bsr
is 28
lzcnt
counts, bsr
returns the index or distance from bit 0 (which is the lsb).
How can both instructions be the same and how can lzcnt
be emulated as bsr
in case there's no BMI on the CPU available? Or is bit 0 in case of bsr
the msb? Both "code operations" in Intel's spec are different too, one counts or indexes from the left, the other from the right.
Maybe someone can shed some light on this, I have no CPU without BMI/lzcnt
instruction to test if the fallback to bsr
works with the same result (as the special case of value 0 to scan never happens).