0

My task is to run the matrix vector multiplication c program for like 14-16 minutes. And I am only able that to run for like 20 seconds maximum. I also tried increasing the value of row and columns and also other bunch of stuff and I am not able to do it. Can you please help me?

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#define NUM_ROWS 600000 // M dimension
#define NUM_COLS 200000// N dimension
void main(int argc, char *argv[])
{
struct timeval start, stop, elapse; // declaring the structure elements
gettimeofday( &start, NULL ); // calling gettimeofday function to calculate time
long long int myid, numprocs, i, j;
long long int local_row_sum;
double starttime, endtime;
long long int *matrix = (long long int*)malloc(NUM_ROWS * NUM_COLS * sizeof(long long int));
long long int *vector = (long long int*)malloc(NUM_COLS * sizeof(long long int ));
long long int *local_result = (long long int *)malloc(NUM_ROWS * sizeof(long long int));
long long int *final_result = (long long int *)malloc(NUM_ROWS * sizeof(long long int ));

// Initialize array and vector
for (i = 0; i < NUM_ROWS; i++) {
    for(j = 0; j < NUM_COLS; j++) {
        matrix[i * NUM_COLS + j] = i + j;
    }
}

for (j = 0; j < NUM_COLS; j++) {
    vector[j] = j;
}
int k, l;
long long int*sequential_result = (long long int*)malloc(NUM_ROWS * 1 * sizeof(long long int ));

for (k = 0; k < NUM_ROWS; k++)
{
    sequential_result[k] = 0;
    for (l = 0; l < NUM_COLS; l++)
    {
        sequential_result[k] += *(matrix + k * NUM_COLS + l) * *(vector + l);
    }
  printf("The result: %lld\n", sequential_result[k]);
    // Check that sequential result equals MPI result (0 is error code here)
  }
 gettimeofday( &stop, NULL ); // calling gametimeofday function to stop counting the time taken
  timersub( &stop, &start, &elapse ); // calling this function to calculate the time difference
  fprintf( stderr, "Elapse time\t%g\n", elapse.tv_sec+0.000001*elapse.tv_usec );

}  
Zoe
  • 27,060
  • 21
  • 118
  • 148
eliesmith
  • 59
  • 6
  • 1
    I don't understand the task of "run the program for X minutes". Are you trying to measure its performance or something? Either add many more rows/columns or perform your multiplication many times in a loop and output the number of runs after the time limit expires. – paddy Apr 23 '21 at 01:27

1 Answers1

0

You're getting a compile error on:

long long int *matrix = (long long int *) malloc(NUM_ROWS * NUM_COLS * sizeof(long long int));

The error is:

orig.c:21:60: warning: integer overflow in expression of type ‘int’ results in ‘-259084288’ [-Woverflow]
orig.c:21:44: warning: argument 1 value ‘18446744071636877312’ exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]

That's because you put the sizeof as the last term, so it used int (32 bits) instead of size_t [probably 64 bits].

Thus, the size argument actually passed to malloc gets wrapped/truncated to 32 bits before it is passed as an argument. So, you'll only allocate a much smaller area than you expect.

This probably results in UB (undefined behavior) because the rest of the code will assume the larger area and will modify memory beyond the end of the actual allocation.

To fix, do this:

long long int *matrix = malloc(sizeof(*matrix) * NUM_ROWS * NUM_COLS);

This will force the entire argument expression to be 64 bits [which is what you want/need]. You should do similar changes for your other malloc calls.

Edit: Oops, I missed the fact that your allocation is so large that it can't be fulfilled by malloc [which was flagged by the second error message]. You'll have to cut down the array sizes [see below]

BTW, don't cast the return from malloc: Do I cast the result of malloc?.

And, using the sizeof(*matrix) trick is much more likely to produce robust code.

Also, you should definitely check the return value from malloc for NULL and abort if so. You're allocating a huge array. Your system may not have enough memory (either physical memory or logical memory if you've defined swap areas).

Since this check has to be repeated, consider using a wrapper function:

void *
safe_malloc(size_t size)
{
    void *ptr;

    ptr = malloc(size);

    if (ptr == NULL) {
        fprintf(stderr,"safe_malloc: malloc failure -- size=%zu -- %s\n",
            size,strerror(errno));
        exit(1);
    }

    return ptr;
}

And, replace the [fixed/adjusted] malloc calls with calls to safe_malloc.

I sometimes also use a macro to further reduce repeating boilerplate:

#define DEFINE_AND_ALLOC_VECTOR(_typ,_ptr,_count) \
    _typ *_ptr = safe_malloc(sizeof(*_ptr) * (_count))

Then, you can do:

DEFINE_AND_ALLOC_VECTOR(long long int,matrix,NUM_ROWS * NUM_COLS);

UPDATE:

what about long long int*sequential_result = (long long int*)malloc(NUM_ROWS * 1 * sizeof(long long int ));

No, I wouldn't do that for a few reasons. It violates the DRY (don't repeat yourself) principle [because you repeated long long int three times.

Under C, casting the return value of malloc which returns a void * is cruft and can silently mask a bug [if you inadvertently fail to do #include <stdlib.h> the compiler will default the return type to int and truncate the return value to 32 bits].

Suppose you changed the type of sequential_result from long long int to (e.g. long double). But, didn't change the [bad] cast type or the sizeof.

Now, you've got a bug:

long double *sequential_result = (long long int *) malloc(sizeof(long long int) * NUM_COLS);

Here not enough space was allocated.

That's why experienced programmers [usually] prefer the sizeof(*ptr)

Further tip: The preferred grouping is:

long long int *sequential_result = ...;

And, not:

long long int* sequential_result = ...;

That's because if you have:

int* a,b;

That seems to be the equivalent of:

int *a;
int *b;

But, it is actually:

int *a;
int b;

The asterisk binds tightly right-to-left and not left-to-right. So, doing int* is deceptive.


UPDATE #2:

Can you please rewrite the program?

Here is the refactored code. I've cut down the dimensions by a factor of 100 to fit into a sane amount of memory.

From your question, it seems you want large arrays so you can benchmark for an extended period of time.

I do a lot of code optimization and benchmarking. Smaller size arrays will give you accurate timing data. For a simple matrix multiply, about 10 seconds is sufficient.

I've used cpp conditionals to show old vs. new code:

#if 0
// old [broken] code
#else
// new [fixed] code
#endif

Anyway, here is the refactored code [as I would write it]:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <time.h>

#if 0
#define NUM_ROWS 600000                 // M dimension
#define NUM_COLS 200000                 // N dimension
#else
#define NUM_ROWS 6000                   // M dimension
#define NUM_COLS 2000                   // N dimension
#endif

#define DEFINE_AND_ALLOC_VECTOR(_typ,_ptr,_count) \
    _typ *_ptr = safe_malloc(sizeof(*_ptr) * _count)

typedef long long int bignum_t;

double
tscgetf(void)
{
    struct timespec ts;
    double sec;

    clock_gettime(CLOCK_MONOTONIC,&ts);
    sec = ts.tv_nsec;
    sec /= 1e9;
    sec += ts.tv_sec;

    return sec;
}

void *
safe_malloc(size_t size)
{
    void *ptr;

    ptr = malloc(size);

    if (ptr == NULL) {
        fprintf(stderr,"safe_malloc: malloc failure -- size=%zu -- %s\n",
            size,strerror(errno));
        exit(1);
    }

    return ptr;
}

int
main(int argc, char **argv)
{
    double start, stop;

    start = tscgetf();

    //int myid;
    //int numprocs;
    size_t i;
    size_t j;
    //bignum_t local_row_sum;

    DEFINE_AND_ALLOC_VECTOR(bignum_t,matrix,NUM_ROWS * NUM_COLS);
    DEFINE_AND_ALLOC_VECTOR(bignum_t,vector,NUM_COLS);
    DEFINE_AND_ALLOC_VECTOR(bignum_t,local_result,NUM_ROWS);
    DEFINE_AND_ALLOC_VECTOR(bignum_t,final_result,NUM_ROWS);

    bignum_t *mat;

    // Initialize array and vector
#if 0
    for (i = 0; i < NUM_ROWS; i++) {
        for (j = 0; j < NUM_COLS; j++) {
            matrix[i * NUM_COLS + j] = i + j;
        }
    }
#else
    for (i = 0; i < NUM_ROWS; i++) {
        mat = &matrix[i * NUM_COLS];
        for (j = 0; j < NUM_COLS; j++)
            mat[j] = i + j;
    }
#endif

    for (j = 0; j < NUM_COLS; j++)
        vector[j] = j;

    DEFINE_AND_ALLOC_VECTOR(bignum_t,sequential_result,NUM_ROWS);

#if 0
    int k, l;
#else
    size_t k, l;
#endif

#if 0
    for (k = 0; k < NUM_ROWS; k++) {
        sequential_result[k] = 0;
        for (l = 0; l < NUM_COLS; l++) {
            sequential_result[k] += *(matrix + k * NUM_COLS + l) *
                *(vector + l);
        }
    }
#else
    for (k = 0;  k < NUM_ROWS;  ++k) {
        bignum_t acc = 0;
        mat = &matrix[k * NUM_COLS];

        for (l = 0;  l < NUM_COLS;  ++l)
            acc += mat[l] * vector[l];

        sequential_result[k] = acc;
#endif

        printf("The %zu result: %lld\n", k, sequential_result[k]);
        // Check that sequential result equals MPI result (0 is error code here)
    }

    stop = tscgetf();
    fprintf(stderr, "Elapse time\t%.9f seconds\n", stop - start);

    return 0;
}
Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • what about `long long int*sequential_result = (long long int*)malloc(NUM_ROWS * 1 * sizeof(long long int ));`? – eliesmith Apr 23 '21 at 01:46
  • Why the DV??? This is a perfectly reasonable answer. The guidelines are that a good faith answer that is not _egregiously_ wrong should _not_ be DV'ed. – Craig Estey Apr 23 '21 at 02:12
  • I did not do that I cannot do that. I am still confused. Can you please rewrite the program? – eliesmith Apr 23 '21 at 02:15
  • I realized that _you_ didn't DV. The comment was for _whomever_ did the "drive by" DVing. – Craig Estey Apr 23 '21 at 02:17
  • Este can you please write the code with the changes that you were talking about? – eliesmith Apr 23 '21 at 02:25
  • Just be patient ;-) I'm doing a full cleanup as I would write the code. – Craig Estey Apr 23 '21 at 02:26
  • @ Graig Estey were you able to run the program for 14 minutes? Beas=use I think so I don't have to think about efficiency. Because that is my first target right now – eliesmith Apr 23 '21 at 03:41