1

I am trying to create a 2D matrix in C (basically a dynamically allocatable 2d array of any given size) in both the most efficient and clean way possible. I had implemented such a thing in a larger project I am working on, but was having issues, and was able to narrow it down to the following.

I decided to malloc a giant array (I called it data), and then make an array of pointers (i called it cell) to be able to address the data in the big array in such a way that would make sense in a two-dimensional context (as in matrix[x][y] instead of data[ugly pointer arithmetic each time].) I thought this would be a good idea because it only calls malloc once, and so it would be faster, also, the allocated memory is in one consecutive block, which I believe (not too knowledgeable here) is a really good thing on some systems because of overhead in keeping track of allocated memory blocks.

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

typedef struct {
    unsigned int sizeX;
    unsigned int sizeY;
    int **cell;
    int *data; /* FOR INTERNAL USE ONLY */
} matrix;

matrix * matrix_malloc(unsigned int, unsigned int);
void matrix_free(matrix *);
void matrix_zero(matrix *);
void matrix_print(matrix *);

int
main(int argc, char *argv[])
{
    int y, x;
    matrix  *theMatrix = NULL;

    if (argc != 3) {
        fprintf(stderr, "usage: %s sizeX sizeY\n", argv[0]);
        return 1;
    }

    x = atoi(argv[1]);
    y = atoi(argv[2]);

    if (x < 10 || y < 10) {
        fprintf(stderr, "usage: sizeX and sizeY must be >= 10\n");
        return 1;
    }

    if ((theMatrix = matrix_malloc(x, y)) == NULL)
        return 1;

    matrix_zero(theMatrix);
    /* lots of modification of the contents of the matrix would happen here */
    matrix_print(theMatrix);

    matrix_free(theMatrix);

    return 0;
}

matrix *
matrix_malloc(unsigned int sizeX, unsigned int sizeY)
{
    int i;
    matrix  *mat;

    if ((mat = malloc(sizeof(matrix))) == NULL) {
        return NULL;
    }

    if ((mat->data = malloc(sizeX * sizeY * sizeof(int))) == NULL) {
        free(mat);
        mat = NULL;
        return NULL;
    }
    if ((mat->cell = malloc(sizeX * sizeof(int *))) == NULL) {
        free(mat->data);
        free(mat);
        mat = NULL;
        return NULL;
    }
    mat->sizeX = sizeX;
    mat->sizeY = sizeY;
    for (i = 0; i < sizeX; i++) {
        mat->cell[i] = mat->data + mat->sizeX * i;
    }

    return mat;
}

void
matrix_free(matrix *mat) {
    free(mat->cell);
    free(mat->data);
    free(mat);
    mat = NULL;
}

void
matrix_zero(matrix *mat)
{
    memset(mat->data, 0, mat->sizeX * mat->sizeY * sizeof(int));
}

void
matrix_print(matrix *mat)
{
    unsigned int    x, y;

    for (x = 0; x < mat->sizeX; x++) {
        for (y = 0; y < mat->sizeY; y++)
            printf("%d ", mat->cell[x][y]);
        printf("\n");
    }
}

When I run the above program as ./a.out 10 10 there is no problem, but when I specify 30 20 instead of 10 10, I run into some issues.

On MacOSX (10.6.7) I get:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 540024880 540024880 540024880 540024880 540024880 808465461 943207474 875896880 875704368 540031032 
842216505 926168880 926425140 909719605 540031032 926234424 909325360 875896888 825438256 540160816 10 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

and then it exits properly.

On OpenBSD (4.7) I get this far:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

and then it just segfaults

My initial thought was that it was just some issue when allocating big enough blocks of memory that they cross page boundaries, but when I use 50 50 as the size, it runs fine.

I've narrowed it down this far, and tried googleing (not quite sure what it is I should be searching for though :| ) and asked a few of my friends, but this has them all stumped.

I found C. Segmentation Fault when function modifies dynamically allocated 2d array int matrix with pointers in C - memory allocation confusion but they were not relevant (as far as I can tell).

If somebody could please point me in the right direction, perhaps point out the problem or point me to some relevant documentation, I would be very grateful.

Community
  • 1
  • 1
user992364
  • 442
  • 3
  • 18
  • BTW: (stylistic) you don't need the pointer array. Just create a static function or macro to convert (x,y) indices to linear offsets. It basically generates the same code, but it saves space, removes a layer of indirection, and avoids trivial errors like these. (well: you could make the same error in a different way) – wildplasser Oct 12 '11 at 23:01
  • @wildplasser Would really be better to do pointer arithmetic each time I want to reference a cell than to have them all pre-calculated and only need to resolve a pointer (which I would probably need to do anyway) each time at the cost of `sizeof(* int) * number_of_rows` memmory? – user992364 Oct 12 '11 at 23:07
  • Arithmetic is cheaper than addressing an array (think: cache slots). Agreed: you'll have to address the struxt->size[xy] to do the calculation, but that will be probably in the cacheline anyway. CPU is for free, cache misses are expensive. – wildplasser Oct 12 '11 at 23:12

2 Answers2

4
for (i = 0; i < sizeX; i++) {
        mat->cell[i] = mat->data + mat->sizeX * i;
    }

One of these SizeX'es needs to be a sizeY.

wildplasser
  • 43,142
  • 8
  • 66
  • 109
3
for (i = 0; i < sizeX; i++) {
    mat->cell[i] = mat->data + mat->sizeX * i;
}

Imagine if sizeX is 100 and sizeY is 2. Here, you're laying out sizeX rows, 100 of them, each sizeX integers, 100 of them. Ooops.

That mat->sizeX should be mat->sizeY. You have sizeX rows, each with sizeY elements in them. So you need to skip forward sizeY integers to get to the next row.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278