1

I am attempting to work out the time it takes for the CPUID instruction run.

Source:

#include <stdio.h>
#include <sched.h>

cpu_set_t mask;

CPU_ZERO(&mask);
CPU_SET(0, &mask);
sched_setaffinity(0, sizeof(mask), &mask);

static inline unsigned long long tick()
{
    unsigned long long d;
    asm volatile ("rdtsc" : "=A" (d));
    return d;
}

void cpuid(void)
{
    int i;

    for(i=0; i != 5; i++)
    {
         asm volatile ("cpuid");
    }
}

int main()
{
    long long bef;
    long long aft;
    long long dif;

    bef=tick();
    cpuid();
    aft=tick();
    dif=aft-bef;

    printf("%d\n", bef);
    printf("%d\n", aft);
    printF("%d\n", dif);

    return 0;
}

Now I am compiling using the following

gcc -D_GNU_SOURCE -o test test.c

I get errors on code that isn't the file! For example:

test.c:6:1: error: expected identifier or '(' before 'do'
test.c:6:1: error: expected identifier or '(' before 'while'
test.c:7:1: error: expected identifier or '(' before '__extension__'
test.c:8:1: warning: data definition has no type or storage class [enable by def...
test.c:8:1: error: intializer element is not constant

The "def..." isn't actually the output its because my terminal windows is tiny. Im working in ESXi.

Any help would be amazing!!!

FOR FUTURE READERS

User @Iwillnotexist Idonotexist is correct in saying use the following function for full x86 & x64 support.

static __inline__ unsigned long long rdtsc(void)
{
  unsigned hi, lo;
  __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
  return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
user3078629
  • 139
  • 1
  • 2
  • 9
  • You're about to make the same mistake too many other people have made before you with `rdtsc` and the `=A` asm constraint... http://stackoverflow.com/questions/19941588/wrong-clock-cycle-measurements-with-rdtsc#comment29841300_19941588 – Iwillnotexist Idonotexist Apr 21 '14 at 18:44
  • I understand that but am I right in saying that it doesnt matter whether RDTSC returns cycles or actual time. Because the differential either way indicates that time has passed, it is just whether you measure in actual time or the amount of cycles during the specified time. I could be wrong. If I am can you possibly explain? – user3078629 Apr 21 '14 at 19:02
  • The problem isn't that, the problem is that RDTSC returns a 64-bit value, but it is _split_ into hi and lo halves in the registers `edx` and `eax`. In 32-bit mode, the `=A` constraint selects correctly the pair of registers `edx:eax`; In 64-bit mode, it won't. Instead you'll be reading `rax` only, whose upper half is zeros after an `rdtsc`. So when you subtract, sometimes you'll get negative values when you shouldn't have, and you won't be able to measure time differences more than a couple seconds, because you're only using the low 32 bits of the TSC counter. – Iwillnotexist Idonotexist Apr 21 '14 at 19:10
  • If you change your code to use the `rdtsc()` function version below `#elif defined(__x86_64__)` here (http://www.mcs.anl.gov/~kazutomo/rdtsc.html) instead of your own `tick()`, it will work on both 32- and 64-bit OSes everywhere. – Iwillnotexist Idonotexist Apr 21 '14 at 19:13
  • I have a new problem. Porting this code to windows :( any ideas? – user3078629 Apr 21 '14 at 19:56
  • `unsigned __int64 __rdtsc();` in header ``. MSDN (http://msdn.microsoft.com/en-us/library/twchhe95%28v=vs.90%29.aspx) is good for these things. – Iwillnotexist Idonotexist Apr 22 '14 at 05:22
  • Yes I had figured that. Just read your comment. Cheers for all the help. – user3078629 Apr 23 '14 at 09:10

2 Answers2

4

You have code outside of a function. That's not allowed. Move the following into main:

CPU_ZERO(&mask);
CPU_SET(0, &mask);
sched_setaffinity(0, sizeof(mask), &mask);
ikegami
  • 367,544
  • 15
  • 269
  • 518
2

These instructions should be into main():

cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(0, &mask);
sched_setaffinity(0, sizeof(mask), &mask);

This should fix the compilation error.

Then, five iterations for cpuid() are far too little to give meaningful results.

You can check this answer but using two sequences of different lengths made with only CPUID instructions. You need a longer cycle, but not so long that memory fetches enter into play.

I have ran some tests with TEST defined between 5 and 1000; CPU affinity did not seem to influence results on a quad-core:

#include <stdio.h>
#include <sched.h>

static inline unsigned long long tick() {
    unsigned long long d;
    asm volatile ("rdtsc" : "=A" (d));
    return d;
}

static inline void cpuid(void) {
    int i;
    for(i=0; i != TEST; i++) {
         asm volatile ("cpuid");
    }
}

int main()
{
    long long bef, aft, dif;

    bef=tick();
    cpuid();
    aft=tick();
    dif=(aft-bef)/TEST;

    printf("%lld\n", dif);

    return 0;
}

gcc -o0 -DTEST=100 -D_GNU_SOURCE -W -Wall -o time time.c && ./time
Community
  • 1
  • 1
LSerni
  • 55,617
  • 10
  • 65
  • 107
  • OK so moving the affinity stuff into main works. What I am trying to do is measure the time it takes for the CPUID to be executed. As this is an instruction that leaves a virtual environment. So I am hoping to see increases in time when executing the instruction from within a VM. I am hoping they will be drastic as my research is aimed at identifying a virtual environment from an executable within the VM. – user3078629 Apr 21 '14 at 18:46
  • I am also seeing negative values being returned from RDTSC on inconsistent basis! Which is why I decided to set CPU affinity in case the instruction was being passed through cores were this would scewed. – user3078629 Apr 21 '14 at 18:51
  • I'm seeing a dif of around 80000. Is that significantly large? – user3078629 Apr 21 '14 at 18:53
  • @user3078629 Did you see my comment above w.r.t. the use of the asm constraint `=A` within `asm volatile ("rdtsc" : "=A" (d));` and why it is wrong? Make sure you change that. For instance, the x86_64 solution here (http://www.mcs.anl.gov/~kazutomo/rdtsc.html) is one way. – Iwillnotexist Idonotexist Apr 21 '14 at 19:02