You're getting a compile error on:
long long int *matrix = (long long int *) malloc(NUM_ROWS * NUM_COLS * sizeof(long long int));
The error is:
orig.c:21:60: warning: integer overflow in expression of type ‘int’ results in ‘-259084288’ [-Woverflow]
orig.c:21:44: warning: argument 1 value ‘18446744071636877312’ exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]
That's because you put the sizeof
as the last term, so it used int
(32 bits) instead of size_t
[probably 64 bits].
Thus, the size
argument actually passed to malloc
gets wrapped/truncated to 32 bits before it is passed as an argument. So, you'll only allocate a much smaller area than you expect.
This probably results in UB (undefined behavior) because the rest of the code will assume the larger area and will modify memory beyond the end of the actual allocation.
To fix, do this:
long long int *matrix = malloc(sizeof(*matrix) * NUM_ROWS * NUM_COLS);
This will force the entire argument expression to be 64 bits [which is what you want/need]. You should do similar changes for your other malloc
calls.
Edit: Oops, I missed the fact that your allocation is so large that it can't be fulfilled by malloc
[which was flagged by the second error message]. You'll have to cut down the array sizes [see below]
BTW, don't cast the return from malloc
: Do I cast the result of malloc?.
And, using the sizeof(*matrix)
trick is much more likely to produce robust code.
Also, you should definitely check the return value from malloc
for NULL
and abort if so. You're allocating a huge array. Your system may not have enough memory (either physical memory or logical memory if you've defined swap areas).
Since this check has to be repeated, consider using a wrapper function:
void *
safe_malloc(size_t size)
{
void *ptr;
ptr = malloc(size);
if (ptr == NULL) {
fprintf(stderr,"safe_malloc: malloc failure -- size=%zu -- %s\n",
size,strerror(errno));
exit(1);
}
return ptr;
}
And, replace the [fixed/adjusted] malloc
calls with calls to safe_malloc
.
I sometimes also use a macro to further reduce repeating boilerplate:
#define DEFINE_AND_ALLOC_VECTOR(_typ,_ptr,_count) \
_typ *_ptr = safe_malloc(sizeof(*_ptr) * (_count))
Then, you can do:
DEFINE_AND_ALLOC_VECTOR(long long int,matrix,NUM_ROWS * NUM_COLS);
UPDATE:
what about long long int*sequential_result = (long long int*)malloc(NUM_ROWS * 1 * sizeof(long long int ));
No, I wouldn't do that for a few reasons. It violates the DRY (don't repeat yourself) principle [because you repeated long long int
three times.
Under C, casting the return value of malloc
which returns a void *
is cruft and can silently mask a bug [if you inadvertently fail to do #include <stdlib.h>
the compiler will default the return type to int
and truncate the return value to 32 bits].
Suppose you changed the type of sequential_result
from long long int
to (e.g. long double
). But, didn't change the [bad] cast type or the sizeof
.
Now, you've got a bug:
long double *sequential_result = (long long int *) malloc(sizeof(long long int) * NUM_COLS);
Here not enough space was allocated.
That's why experienced programmers [usually] prefer the sizeof(*ptr)
Further tip: The preferred grouping is:
long long int *sequential_result = ...;
And, not:
long long int* sequential_result = ...;
That's because if you have:
int* a,b;
That seems to be the equivalent of:
int *a;
int *b;
But, it is actually:
int *a;
int b;
The asterisk binds tightly right-to-left and not left-to-right. So, doing int*
is deceptive.
UPDATE #2:
Can you please rewrite the program?
Here is the refactored code. I've cut down the dimensions by a factor of 100 to fit into a sane amount of memory.
From your question, it seems you want large arrays so you can benchmark for an extended period of time.
I do a lot of code optimization and benchmarking. Smaller size arrays will give you accurate timing data. For a simple matrix multiply, about 10 seconds is sufficient.
I've used cpp
conditionals to show old vs. new code:
#if 0
// old [broken] code
#else
// new [fixed] code
#endif
Anyway, here is the refactored code [as I would write it]:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <time.h>
#if 0
#define NUM_ROWS 600000 // M dimension
#define NUM_COLS 200000 // N dimension
#else
#define NUM_ROWS 6000 // M dimension
#define NUM_COLS 2000 // N dimension
#endif
#define DEFINE_AND_ALLOC_VECTOR(_typ,_ptr,_count) \
_typ *_ptr = safe_malloc(sizeof(*_ptr) * _count)
typedef long long int bignum_t;
double
tscgetf(void)
{
struct timespec ts;
double sec;
clock_gettime(CLOCK_MONOTONIC,&ts);
sec = ts.tv_nsec;
sec /= 1e9;
sec += ts.tv_sec;
return sec;
}
void *
safe_malloc(size_t size)
{
void *ptr;
ptr = malloc(size);
if (ptr == NULL) {
fprintf(stderr,"safe_malloc: malloc failure -- size=%zu -- %s\n",
size,strerror(errno));
exit(1);
}
return ptr;
}
int
main(int argc, char **argv)
{
double start, stop;
start = tscgetf();
//int myid;
//int numprocs;
size_t i;
size_t j;
//bignum_t local_row_sum;
DEFINE_AND_ALLOC_VECTOR(bignum_t,matrix,NUM_ROWS * NUM_COLS);
DEFINE_AND_ALLOC_VECTOR(bignum_t,vector,NUM_COLS);
DEFINE_AND_ALLOC_VECTOR(bignum_t,local_result,NUM_ROWS);
DEFINE_AND_ALLOC_VECTOR(bignum_t,final_result,NUM_ROWS);
bignum_t *mat;
// Initialize array and vector
#if 0
for (i = 0; i < NUM_ROWS; i++) {
for (j = 0; j < NUM_COLS; j++) {
matrix[i * NUM_COLS + j] = i + j;
}
}
#else
for (i = 0; i < NUM_ROWS; i++) {
mat = &matrix[i * NUM_COLS];
for (j = 0; j < NUM_COLS; j++)
mat[j] = i + j;
}
#endif
for (j = 0; j < NUM_COLS; j++)
vector[j] = j;
DEFINE_AND_ALLOC_VECTOR(bignum_t,sequential_result,NUM_ROWS);
#if 0
int k, l;
#else
size_t k, l;
#endif
#if 0
for (k = 0; k < NUM_ROWS; k++) {
sequential_result[k] = 0;
for (l = 0; l < NUM_COLS; l++) {
sequential_result[k] += *(matrix + k * NUM_COLS + l) *
*(vector + l);
}
}
#else
for (k = 0; k < NUM_ROWS; ++k) {
bignum_t acc = 0;
mat = &matrix[k * NUM_COLS];
for (l = 0; l < NUM_COLS; ++l)
acc += mat[l] * vector[l];
sequential_result[k] = acc;
#endif
printf("The %zu result: %lld\n", k, sequential_result[k]);
// Check that sequential result equals MPI result (0 is error code here)
}
stop = tscgetf();
fprintf(stderr, "Elapse time\t%.9f seconds\n", stop - start);
return 0;
}