0

I wrote some code to measure cpu cycles per byte. I'm getting negative cpb but dont know why ... It shows me that cpb = -0.855553 cycles/byte

My pseudocode:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

uint64_t rdtsc(){
    unsigned int lo,hi;
    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
    return ((uint64_t)hi << 32) | lo;
}

int main()
{
    long double inputsSize = 1024;
    long double counter = 1;

    long double cpuCycleStart = rdtsc();

        while(counter < 3s)
            function(args);

    long double cpuCycleEnd = rdtsc();

        long double cpb = ((cpuCycleEnd - cpuCycleStart) / (counter *  inputsSize));

    printf("%Lf cycles/byte\n", cpb);

    return 0;
}

EDIT, IMPROVED CODE, RESULTS ARE THE SAME (NEGATIVE):

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

unsigned long rdtsc( void )
    {
        unsigned long lo, hi;
        asm( "rdtsc" : "=a" (lo), "=d" (hi) );
        return( lo );
    }

int main()
{
    long double counter;
    long double inputsSize = 1024;
    char *buff = createInput(inputsSize);

    long double cpuCycleStart = rdtsc();
        countDownTime(3.0);
    for(counter=1; !secondsElapsed; counter++)
            function(args);
    long cpuCycleEnd = rdtsc();

        long double cpb = ((cpuCycleEnd - cpuCycleStart) / (counter *  inputsSize));

    printf("%Lf cycles/byte\n", cpb);

    return 0;
}

Its really strange. Wrote testing code:

printf("\n%lu cpuCycleEnd \n%lu cpuCycleStart \n", cpuCycleEnd, cpuCycleStart);
    printf("\n%lu counter\n%lu inputsSize \n\n", counter, inputsSize);

        long double cpb = (((long double)cpuCycleEnd - (long double)cpuCycleStart) / ((long double)counter *  (long double)inputsSize));

    printf("%Lf cycles/byte\n", cpb);

which shows:

30534991 cpuCycleEnd 
1139165971 cpuCycleStart 

1273029 counter
1024 inputsSize 

-0.850450 cycles/byte

any ideas?

nullpointer
  • 245
  • 3
  • 6
  • 15
  • What are you expecting `while(counter < 3s)function(args);` to do, if you never update `counter`? Why does it say `3s`? – Dan Jul 31 '13 at 12:22
  • 3
    Why are you storing a `uint64_t` into a `long double`? – Dan Jul 31 '13 at 12:24
  • 2
    Incidentally, on modern processors, `rdtsc` is defined to measure real time (wall-clock time), not processor time. Intel changed the specification years ago. It will not measure processor cycles in the presence of processor speed changes or various power states. – Eric Postpischil Jul 31 '13 at 12:27
  • @Dan: improved code but still gives me negative results. Yes, I've updated counter in my first code. – nullpointer Jul 31 '13 at 12:31
  • @EricPostpischil: yes, I know that but tested it earlier on a different function and results were ok :/ – nullpointer Jul 31 '13 at 12:33
  • 1
    Before the existing `printf`, print the values of `cpuCycleEnd`, `cpuCycleStart`, `counter`, and `inputsSize` separately. If `counter` or `inputsSize` are negative, figure out why and fix them. If `cpuCycleEnd` is less than `cpuCycleStart`, figure out why. Did the counter wrap? Are they close to other values returned by `rdtsc` calls (if you insert more calls to see)? Is `unsigned long` 64 bits in your C implementation? If you print the value of `rdtsc` as an `unsigned long`, is it the same value that is printed after converting it to `double`? – Eric Postpischil Jul 31 '13 at 12:37
  • 1
    Note that 64-bit `double` cannot store all the bits of a 64-bit `unsigned long`. If the `rdtsc` values have some of the high nine bits set, you may be getting rounding errors. This should result in losing precision but not negative values (the effects of rounding should be monotonic) in the subtraction, until the counter wraps. In any case, it is better to store `rdtsc` values as `uint64_t` until after they are subtracted, then convert the result of subtracting to `double` if desired. – Eric Postpischil Jul 31 '13 at 12:40
  • 1
    I get a negative result (often) when I compile and execute for i386 (`unsigned long` is 32 bits) but not when I compile and execute for x86_64 (`unsigned long` is 64 bits). Check that your build target is a 64-bit target. Add a statement `printf("sizeof(unsigned long) is %zu bytes.\n", sizeof(unsigned long));` and see whether it prints eight. It would be preferable to `#include ` and use `uint64_t` instead of `unsigned long`. Also, I had to make a number of modifications to compile. If the problem persists, please post a [self-contained compilable example](http://sscce.org). – Eric Postpischil Jul 31 '13 at 12:46
  • @EricPostpischil: added some updates to my post. What do you suggest about types, how to change them for all my used variables? Btw, `sizeof(unsigned long) is 4 bytes` shows me 4. – nullpointer Jul 31 '13 at 12:48

1 Answers1

5

You are compiling for a target in which unsigned long is 32 bits.

You should #include <stdint.h> and use uint64_t instead of unsigned long. Additionally, you may want to compile for a target in which unsigned long is 64 bits.

(Note: To print a uint64_t, #include <inttypes.h> and use printf("%" PRIu64 "\n", value);.)

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • I get: `error: impossible register constraint in ‘asm’` when I changed it to `uint64_t`. – nullpointer Jul 31 '13 at 12:52
  • 2
    @nullpointer: This works for me: `uint64_t lo, hi; __asm__("rdtsc" : "=a" (lo), "=d" (hi)); return hi << 32 | lo;` when compiling for `-arch x86_64`. If you are compiling for `-arch i386`, then you may need to make `lo` and `hi` 32-bit integers (`uint32_t`) and use `return (uint64_t) hi << 32 | lo;`. – Eric Postpischil Jul 31 '13 at 13:04
  • thanks, this worked: http://pastie.org/private/7n1q6ccagthqo70bvhmcq (hope its finally ok now?). Also, another question. Finally I have non-negative results, take a look: http://pastie.org/private/4c9taxfaljgft5spbyv3nq. Is everything ok now with variables types? – nullpointer Jul 31 '13 at 14:53
  • can I use `uint64_t` and `uint32_t` also for windows, lets say mingw and visual? – nullpointer Jul 31 '13 at 15:07
  • 1
    @nullpointer: If you are asking about using `uint64_t` and `uint32_t` in this `rdtsc` and timing code when building for different platforms, then, yes, you should seek to use these types, as long as they do not run into compiler issues (e.g., the use of `asm` and its operand constraints may be finicky). – Eric Postpischil Jul 31 '13 at 15:54
  • Use `unsigned long` for the inline asm, which is 64-bit on x86-64 (except on Windows or the x32 ABI (32-bit pointers)), but 32-bit on i386. i.e. it always fits in a single register. @nullpointer: see [Get CPU cycle count?](https://stackoverflow.com/q/13772567) for a safe inline-asm implementation. (Don't forget `volatile`, unless you want the compiler to reuse the same timestamp when optimizing!) – Peter Cordes Aug 18 '18 at 16:07