4

What I am looking for: copyij(): dim=2048: elapsed=0.13 secs copyji(): dim=2048: elapsed=0.97 secs

What I have tried already:

using the del_sec variable that represents delayed seconds and assigned it as delayed milliseconds / NNN (This was a suggestion from another person, I do not know WHY he suggested that I divide milliseconds by NNN, but it brought me closer.

It had both copy ij() and copy ji() elapse to 0.3485 seconds a piece, which is close but you know what they say about cigars.

What I think the problem is:

I know this has to do with the del_sec variable, as the printf function in the skeleton program (The one I am using) effectively makes it so that the elapsed number will print an unsigned number with 3 decimal places (this however, does not happen.)

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <stdint.h>

#define NNN 2048

void copyij();
void copyji();
void init_mat();
int64_t time_diff();

int src[NNN][NNN], dst[NNN][NNN];

int main(int argc,char **argv) {

  int i,j,n;

  long int del_sec,del_msec;
  struct timeval tv_s,tv_e;

  init_mat();
  gettimeofday(&tv_s, NULL); 
  copyij();
  gettimeofday(&tv_e, NULL); 

  del_sec = del_sec/del_msec/NNN;
  /* fill here to compute elapsed time */

  printf("copyij(): dim=%d: elapsed=%ld.%03ld secs\n", NNN , del_sec , del_msec/1000);


  init_mat();
  gettimeofday(&tv_s, NULL); 
  copyji();
  gettimeofday(&tv_e, NULL); 

     del_sec = del_sec/del_msec/NNN;
  /* fill here to compute elapsed time */

  printf("copyji(): dim=%d: elapsed=%ld.%03ld secs\n", NNN , del_sec, del_msec/1000 );

  return 0;
}

void copyij(){
  int i,j;

      for(i = 0; i <NNN; i++)
         for(j=0; j < NNN; j++) 
            src[i][j] =+ 1; 
  /* fill here */

}

void copyji(){
  int i,j;
   for(i = 0; i < NNN; i++)   
      for(j = 0; j < NNN; j++)   
         dst[j][i] += 1; 

  /* fill here */

}

void init_mat(){
  int i,j;

  for (i=0;i<NNN;i++)
    for (j=0;j<NNN;j++) src[i][j] = dst[i][j] = 1;

}
The_Senate
  • 149
  • 6
  • Okay, this is PERFECT, because I just started playing around with the .tv_sec statement. But, I have a question. Why am I dividing by 1000000 Does this have something to do with measurement of time that I'm not accounting for? – The_Senate Sep 16 '18 at 19:55
  • 2
    @The_Senate: There are 1,000,000.0 microseconds in a second — so you divide by a million to get the correct fractional second value. Consider using `clock_gettime()` instead of `gettimeofday()` — but then the divider is 1,000,000,000.0 as there are a billion nanoseconds in a second (and the sub-second component is `tv_nsec`, not `tv_usec`). – Jonathan Leffler Sep 16 '18 at 20:01
  • 1
    From user3121023's comment, dividing `tv_usec` by 1000000 converts it to _fractional_ seconds [from microseconds] so it can be added to `tv_sec` (i.e. units are now the same). `del_sec` and `del_msec` are never initialized [not needed and might be zero, causing a divide-by-zero exception], so do `printf("dim=%d: elapsed=%.6f\n",dim,elapsed);` – Craig Estey Sep 16 '18 at 20:04
  • Alright, so I have tried an updated version(Thanks to present company ofc) that gives me something close to the answer provided, and before I decide to call it a day I'm going to try it on Linux to see if it makes a difference. The number of seconds yielded for both however varies depending on when I compile, but the seconds amount is ALWAYS the same for some reason. copyij results in 0.3108 secs printed as does copyji. The weird thing about this is that even when I even try to create a statement assigning del_msec anything and give del_msec usec arguments everything goes to 0.0000 – The_Senate Sep 16 '18 at 21:59
  • 1
    Please note that an optimizing compiler will strip this code down to the pure printing, producing an output of "copyij(): dim=0: elapsed=0.000 secs". The reason is, that a) the "computation" in `copyXX()` is not used, and b) it's so trivial that a smart compiler may deduce the end result at compile time. If you want to have measurements that are worth something, you must ensure that the result of your computation is actually used (for example in a `printf()` call), and that the result cannot be easily divined by the compiler. – cmaster - reinstate monica Sep 17 '18 at 11:09
  • Also, it's generally not advisable to perform measurements without compiler optimizations. The cruft that's generated by compilers without optimizations is unbelievable. I would suggest to always use either `-Os` or `-O2` when doing measurements. Or even higher levels, if your compiler supports them. `-O0` and `-O1` leave too much cruft in the compiled code to yield meaningful results. – cmaster - reinstate monica Sep 17 '18 at 11:13

1 Answers1

0

In copyij you have

src[i][j] =+ 1;

But I think you meant

src[i][j] += 1;

Measuring Time

For keeping track of elapsed time, I recommend clock_gettime with a monotonic clock CLOCK_MONOTONIC, or CLOCK_MONOTONIC_COARSE if your system has it.

To find the time delta, you can use the following macro (from OpenBSD's sys/time.h, see the manual timespecsub(3)):

#define timespecsub(tsp, usp, vsp)                        \
    do                                                    \
    {                                                     \
        (vsp)->tv_sec = (tsp)->tv_sec - (usp)->tv_sec;    \
        (vsp)->tv_nsec = (tsp)->tv_nsec - (usp)->tv_nsec; \
        if ((vsp)->tv_nsec < 0)                           \
        {                                                 \
            (vsp)->tv_sec--;                              \
            (vsp)->tv_nsec += 1000000000L;                \
        }                                                 \
    } while (0)

For example:

struct timespec start, end, delta;
timespecsub(&end, &start, &delta);
// delta contains the time difference from start to end

If you just want to benchmark the copy functions, then dividing by NNN doesn't make sense unless you want to benchmark the per-dim performance (row-wise for ij, column-wise for ji).

To print struct timespec you can use %lld for tv_sec and %.9ld for tv_nsec.

Compiler Optimizations

To ensure the compiler doesn't optimize away your functions:

  1. Use volatile for the matrices. This prevents the compiler from optimizing away operations on them.
static volatile int src[NNN][NNN], dst[NNN][NNN];
  1. For GCC, use a compiler directive such as #pragma GCC optimize("O0") around the copy functions to prevent GCC optimizing the loops. See here for alternatives.
#pragma GCC push_options
#pragma GCC optimize("O0")

static void
copyij()
{
    // ...
}

static void
copyji()
{
    // ...
}

#pragma GCC pop_options

Without these precautions, from testing GCC (v11.2.1 20210728) seems to optimize out the functions at -O3, but not -O2 or below.

Results

Updated code (gist)

#include <assert.h>
#include <stdio.h>
#include <time.h>

#define NNN 2048
#define BENCH_COUNT 10
#define timespecsub(tsp, usp, vsp)                        \
    do                                                    \
    {                                                     \
        (vsp)->tv_sec = (tsp)->tv_sec - (usp)->tv_sec;    \
        (vsp)->tv_nsec = (tsp)->tv_nsec - (usp)->tv_nsec; \
        if ((vsp)->tv_nsec < 0)                           \
        {                                                 \
            (vsp)->tv_sec--;                              \
            (vsp)->tv_nsec += 1000000000L;                \
        }                                                 \
    } while (0)

static void benchmark(void (*f)(void), const char *name);
static void copyij();
static void copyji();
static void init_mat();

static volatile int src[NNN][NNN], dst[NNN][NNN];

int main(void)
{
    size_t i;
    printf("ij:\n");
    for (i = 0; i < BENCH_COUNT; i++)
    {
        benchmark(copyij, "copyij");
    }
    printf("ji:\n");
    for (i = 0; i < BENCH_COUNT; i++)
    {
        benchmark(copyji, "copyji");
    }
}

static void
benchmark(void (*f)(void), const char *name)
{
    struct timespec start, end, delta;

    init_mat();
    assert(clock_gettime(CLOCK_MONOTONIC, &start) != -1);
    f();
    assert(clock_gettime(CLOCK_MONOTONIC, &end) != -1);

    timespecsub(&end, &start, &delta);
    printf("%s: NNN=%d: elapsed=%lld.%.9ld secs\n", name, NNN, delta.tv_sec,
           delta.tv_nsec);
}

#pragma GCC push_options
#pragma GCC optimize("O0")

static void
copyij()
{
    size_t i, j;

    for (i = 0; i < NNN; i++)
    {
        for (j = 0; j < NNN; j++)
        {
            src[i][j] += 1;
        }
    }
}

static void
copyji()
{
    size_t i, j;

    for (i = 0; i < NNN; i++)
    {
        for (j = 0; j < NNN; j++)
        {
            dst[j][i] += 1;
        }
    }
}

#pragma GCC pop_options

static void
init_mat()
{
    size_t i, j;

    for (i = 0; i < NNN; i++)
    {
        for (j = 0; j < NNN; j++)
        {
            src[i][j] = dst[i][j] = 1;
        }
    }
}

Output (i7-7500U @ 2.7 GHz, inside WSL, -O3)

ij:
copyij: NNN=2048: elapsed=0.020582100 secs
copyij: NNN=2048: elapsed=0.016620800 secs
copyij: NNN=2048: elapsed=0.016156000 secs
copyij: NNN=2048: elapsed=0.017765700 secs
copyij: NNN=2048: elapsed=0.016158500 secs
copyij: NNN=2048: elapsed=0.016127900 secs
copyij: NNN=2048: elapsed=0.016153200 secs
copyij: NNN=2048: elapsed=0.016337300 secs
copyij: NNN=2048: elapsed=0.016625900 secs
copyij: NNN=2048: elapsed=0.016512300 secs
ji:
copyji: NNN=2048: elapsed=0.055380300 secs
copyji: NNN=2048: elapsed=0.056751900 secs
copyji: NNN=2048: elapsed=0.055770200 secs
copyji: NNN=2048: elapsed=0.056378700 secs
copyji: NNN=2048: elapsed=0.057477700 secs
copyji: NNN=2048: elapsed=0.058508900 secs
copyji: NNN=2048: elapsed=0.058080200 secs
copyji: NNN=2048: elapsed=0.057968100 secs
copyji: NNN=2048: elapsed=0.058937900 secs
copyji: NNN=2048: elapsed=0.056836300 secs

I suspect ji is slower because it operates on noncontiguous memory.

esote
  • 831
  • 12
  • 25