3

I have some code that relies on AVX.
In the same code base I also use TZCNT.
The latter is part of BMI1. I know I can test for this instruction using CPUID, but I'm lazy so I did not actually implement that.

To test for support I simply perform an AVX instruction. If I get a #UD undefined instruction exception back I know the CPU does not support AVX.
However tzcnt is backwards compatible (kind of) with the bsf (or bsr - I always forget which is which), so that will not trigger an exception.

If I have AVX support, does that imply BMI1 support?
For the record, I do not have AVX2 on the CPU that I'm testing with right now.

Johan
  • 74,508
  • 24
  • 191
  • 319
  • 1
    Even if you don't want to test for BMI support, you should usually use `tzcnt` (`rep bsf`) when you don't care about the behaviour with input=0. `tzcnt` is a lot faster than `bsf` on AMD CPUs. And on Intel Skylake (and later?) it avoids the false dependency on the write-only destination register that `bsf` has. (`popcnt` still has the false dep on SKL, like `lz/tzcnt` on earlier Intel CPUs.) – Peter Cordes Jun 30 '17 at 02:05

1 Answers1

4

No, AVX support does not imply BMI1 support.

See the following table for details:

          Intel          AMD                  Year
---------------------------------------------------
AVX      Sandy Bridge    Bulldozer           2011
---------------------------------------------------
BMI1     Haswell         Piledriver/Jaguar   2013
---------------------------------------------------
ABM                      Barcelona           2007
         Haswell                             2013
---------------------------------------------------
AVX2     Haswell                             2013
                         Carrizo             2015
                         Ryzen               2017
---------------------------------------------------
BMI2     Haswell                             2013
                         Excavator           2015
                         Ryzen               2017

Most processors support both, but AVX predates BMI1 by two years.
Add to this the fact that tzcnt and bsf have different semantics with regards to the flags.
If you want to force a #UD exception, you can use a andn.

Source: Wikipedia: BMI, AVX

If you want to use CPUID:

BMI1 -> CPUID.(EAX=07H, ECX=0H):EBX.BMI1[bit 3]
(ANDN, BEXTR, BLSI, BLSMSK, BLSR, TZCNT)

BMI2  -> CPUID.(EAX=07H, ECX=0H):EBX.BMI2[bit 8]
(BZHI, MULX, PDEP, PEXT, RORX, SARX, SHLX, SHRX)

LZCNT -> CPUID.(EAX=80000001H) ECX.LZCNT[bit 5]  

POPCNT -> CPUID.(EAX=01H) :ECX.POPCNT [Bit 23]

Note that even if CPUID indicates that a (Intel) processor does not support popcnt it often does.

Johan
  • 74,508
  • 24
  • 191
  • 319
  • In case you'd like to add them to your answer: **BMI1** *(ANDN, BEXTR, BLSI, BLSMSK, BLSR, TZCNT)* -> `CPUID.(EAX=07H, ECX=0H):EBX.BMI1[bit 3]` **BMI2** *(BZHI, MULX, PDEP, PEXT, RORX, SARX, SHLX, SHRX)* -> `CPUID.(EAX=07H, ECX=0H):EBX.BMI2[bit 8]` **LZCNT** -> `CPUID.EAX=80000001H:ECX.LZCNT[bit 5]`. This is Intel terminology: `CPUID.EAX=80000001H:ECX.LZCNT[bit 5]` denotes ABM (i.e. `popcnt` +`lzcnt`) on AMD processors (Since `popcnt` has it's own CPUID bit and ABM -> `popcnt` but not viceversa). – Margaret Bloom Jun 29 '17 at 15:15
  • 1
    I'm not sure if `andn` traps everywhere. The VEX prefix is just lds after all, I'm not sure if all old CPUs raise an exception when they see an lds with an invalid operand combination. Though, in 64 bit mode, this shouldn't be an issue as lds and les are illegal. – fuz Jun 29 '17 at 23:58
  • 1
    Which Intel CPUs can execute `popcnt` correctly but don't set the CPUID feature bit? It faults on my first-gen Core2 (Conroe/Merom: SSSE3 but not SSE4.1). – Peter Cordes Jun 30 '17 at 02:11
  • 1
    Virtual machines can expose any combination of CPUID flags they want. Although it's a lot more likely that they'd enable BMI without AVX than vice versa, because AVX implies extra architectural state. So when convenient, you should avoid making assumptions based on what hardware has been released. (It's relevant for deciding what versions of a function to create, though. e.g. it makes sense to avoid depending on BMI in an AVX version of a function. But you should test for all relevant feature bits when doing runtime CPU detection.) – Peter Cordes Jun 30 '17 at 02:13
  • You just do the CPU feature detection once, at initialization time, and set flags in a global bitfield. That way, it's very cheap and you don't ever have to worry about the cost of checking. – Cody Gray - on strike Jun 30 '17 at 09:17
  • @CodyGray, I do that normally, but when performance testing short snippets. It's just easier to rely on an #UD exception which get automatically logged. – Johan Jun 30 '17 at 09:26