Differing CPUID usage from high-level languages

Question

I'm attempting to utilize an x86 ASM function that requires certain processor architecture. I understand that I need to check a specific bit after calling "CPUID standard function 01H". Below is a C implementation from the CPUID Wikipedia page for calling CPUID:

#include <stdio.h>

int main() {
    int i;
    unsigned int index = 0;
    unsigned int regs[4];
    int sum;
    __asm__ __volatile__(
#if defined(__x86_64__) || defined(_M_AMD64) || defined (_M_X64)
        "pushq %%rbx     \n\t" /* save %rbx */
#else
        "pushl %%ebx     \n\t" /* save %ebx */
#endif
        "cpuid            \n\t"
        "movl %%ebx ,%[ebx]  \n\t" /* write the result into output var */
#if defined(__x86_64__) || defined(_M_AMD64) || defined (_M_X64)
        "popq %%rbx \n\t"
#else
        "popl %%ebx \n\t"
#endif
        : "=a"(regs[0]), [ebx] "=r"(regs[1]), "=c"(regs[2]), "=d"(regs[3])
        : "a"(index));
    for (i=4; i<8; i++) {
        printf("%c" ,((char *)regs)[i]);
    }
    for (i=12; i<16; i++) {
        printf("%c" ,((char *)regs)[i]);
    }
    for (i=8; i<12; i++) {
        printf("%c" ,((char *)regs)[i]);
    }
    printf("\n");
}

Though the Linux kernel uses the function below:

static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
                                unsigned int *ecx, unsigned int *edx)
{
        /* ecx is often an input as well as an output. */
        asm volatile("cpuid"
            : "=a" (*eax),
              "=b" (*ebx),
              "=c" (*ecx),
              "=d" (*edx)
            : "0" (*eax), "2" (*ecx));
}

Which one is better? Other they essentually equivalent?

Inline asm is already gcc specific so you might as well use the [compiler builtin](http://stackoverflow.com/a/14266932/547981) instead. Other compilers also have similar builtins. — Jester, Jul 20 '16 at 15:45
I'm not sure 'builtin' is the right term here, since that normally refers to a function builtin to the compiler itself. cpuid.h just creates a define which is the inline asm. Also, unlike `native_cpuid` routine above, the `__cpuid` define doesn't allow you to input ecx. — David Wohlferd, Jul 20 '16 at 18:05

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

As Jester says, in GNU C the cpuid.h wrapper intrinsic is probably your best bet.

There's also __builtin_cpu_supports("popcnt") or "avx" or whatever, which works after you call __builtin_cpu_init(). Only the really major feature-bits are supported, though. For example, the docs don't mention the feature-bit for rdrand, so __builtin_cpu_supports("rdrand") probably doesn't work.

Custom inline-assembly versions:

The implementation from Linux can inline with no wasted instructions, and it looks well-written, so there's no reason to use anything else. It's remotely possible that you might get a complaint about not being able to satisfy the "=b" constraint; if so see below for what clang's cpuid.h does. (But I think that's never necessary and the result of a documentation mistake).

It doesn't actually need volatile, though, if you're using it for the values produced rather than the serializing effect on the pipeline: Running CPUID with the same inputs will give the same result, so we can let the optimizer move it around or hoist it out of loops. (So it runs fewer times). This is probably not helpful because normal code won't use it in a loop in the first place, though.

The source for clang's implementation of cpuid.h does some weird stuff, like preserving %rbx because apparently some x86-64 environments might not be able to satisfy a constraint that uses %rbx as an output operand? The comment is /* x86-64 uses %rbx as the base register, so preserve it. */, but I have no idea what they're talking about. If anything x86-32 PIC code in the SysV ABI uses %ebx for a fixed purpose (as a pointer to the GOT), but I don't know about anything like that for x86-64. Perhaps that code is motivated by a mistake in the ABI documentation? See HJ Lu's mailing list post about it.

Most importantly, the first version in the question (inside main()) is broken because it clobbers the red-zone with push.

To fix it, just tell the compiler the result will be in ebx (with "=b"), and let it worry about saving/restoring ebx/rbx at the start/end of the function.

Thank you so much, Mr. Cordes. I've used assembly for an atmega128 and C of course for Linux OS but I've never actually combined the two- it's all been a bit intimidating haha My concern with the `cpuid.h` includes was the fact that `__get_cpuid()` isn't portable, if I understand it correctly. So I was looking for something more reliable. Thank you so much for your answer which touched on the two solutions I posted; I learned a lot — 8protons, Jul 21 '16 at 15:02
@8protons: I think `cpuid.h` is about as portable as GNU C inline-asm syntax. Both are supported in any GNU C compiler (gcc, clang, icc, some others), and not otherwise. Actually `cpuid.h` might be more recent and maybe not always supported, so IDK. — Peter Cordes, Jul 21 '16 at 20:32
CPUID is obviously not going to be *portable*, @8protons, as it is x86-specific. What exactly are your "portability" and "reliability" concerns? — Cody Gray - on strike, Jul 25 '16 at 23:30
@CodyGray Basically I want to use RDRAND() but not all CPU's support it. To check if the current CPU does, one sees if bit 30 in ECX is set when the CPU standard function is called. (If the bit isn't set the I'll use an alternative to RDRAND) — 8protons, Jul 25 '16 at 23:35

Differing CPUID usage from high-level languages

1 Answers1

Custom inline-assembly versions: