In C, as a general guideline and with no sample code to look at, the only thing you can do to optimize matrix operations is making sure you are using contiguous memory blocks so the whole matrices can be kept in the processor's cache (or at least reduce RAM interaction to the possible minimum), i.e. if you are dynamically allocating memory, ask for a whole block of memory for each matrix, and then either handle the indexes arithmetically:
for (i = 0;i < rows; i++)
{
for (j = 0; j < columns; j++)
{
matrix[i*rows + j] = do_whatever();
}
}
or create a set of pointers to the beginning of your columns if you prefer to use the standard [i][j] notation, although this approach has the potential to reduce performance since processor would have to handle 2 arrays instead of one for a single matrix. If you are using standard arrays, you won't have to worry about it.
The other important change you can make is parallelization of your calculations (multiple threads).
Working with matrices is inherently slow, and optimization tricks can only be applied if certain assumptions about the data can be made, like symmetry or some other property that could save you some operations.