-1

I am trying to do a large matrix multiplication, e.g. 1000x1000. Unfortunately, it only works for very small matrices. For the big ones, the program just turns on and that's all - no results. Here's the code:

#include <iostream>

using namespace std;

int main() {
    int matrix_1_row;
    int matrix_1_column;
    matrix_1_row = 10;
    matrix_1_column = 10;

    int** array_1 = new int* [matrix_1_row];
    // dynamically allocate memory of size matrix_1_column for each row
    for (int i = 0; i < matrix_1_row; i++)
    {
        array_1[i] = new int[matrix_1_column];
    }
    // assign values to allocated memory
    for (int i = 0; i < matrix_1_row; i++)
    {
        for (int j = 0; j < matrix_1_column; j++)
        {
            array_1[i][j] = 3;
        }
    }

    int matrix_2_row;
    int matrix_2_column;
    matrix_2_row = 10;
    matrix_2_column = 10;
    // dynamically create array of pointers of size matrix_2_row
    int** array_2 = new int* [matrix_2_row];
    // dynamically allocate memory of size matrix_2_column for each row
    for (int i = 0; i < matrix_2_row; i++)
    {
        array_2[i] = new int[matrix_2_column];
    }
    // assign values to allocated memory
    for (int i = 0; i < matrix_2_row; i++)
    {
        for (int j = 0; j < matrix_2_column; j++)
        {
            array_2[i][j] = 2;
        }
    }

    // Result
    int result_row = matrix_1_row;
    int result_column = matrix_2_column;
    // dynamically create array of pointers of size result_row
    int** array_3 = new int* [result_row];
    // dynamically allocate memory of size result_column for each row
    for (int i = 0; i < result_row; i++)
    {
        array_3[i] = new int[result_column];
    }


    // Matrix multiplication
    for (int i = 0; i < matrix_1_row; i++)
    {
        for (int j = 0; j < matrix_2_column; j++)
        {
            array_3[i][j] = 0;
            for (int k = 0; k < matrix_1_column; k++)
            {
                array_3[i][j] += array_1[i][k] * array_2[k][j];
            }
        }
    }


    //RESULTS
    for (int i = 0; i < result_row; i++)
    {
        for (int j = 0; j < result_column; j++)
        {
            std::cout << array_3[i][j] << "\t";
        }
    }


    // deallocate memory using delete[] operator 1st matrix
    for (int i = 0; i < matrix_1_row; i++)
    {
        delete[] array_1[i];
    }
    delete[] array_1;
    // deallocate memory using delete[] operator 2nd matrix
    for (int i = 0; i < matrix_2_row; i++)
    {
        delete[] array_2[i];
    }
    delete[] array_2;
    // deallocate memory using delete[] operator result
    for (int i = 0; i < result_row; i++)
    {
        delete[] array_3[i];
    }
    delete[] array_3;

    return 0;
}

Anyone have an idea how to fix it? At what point did I go wrong? I used pointers, dynamic memory allocation.

edwaan34
  • 11
  • 1
  • 1
    Instead of all manual memory allocation/deletion, you could store the data in `std::vector>`. – Eugene Oct 25 '21 at 18:46
  • 1
    First off: wrap all this in a class so you don't have to call `delete[]` manually. Second, matrices should only use a single allocation for the entire data, then do math to calculate the index (again, hidden within the class). Third, for matrix multiplication in specific, pay attention to what order you're accessing the memory in, because CPU cache matters a lot. – o11c Oct 25 '21 at 18:46
  • For my third point, see e.g. https://stackoverflow.com/a/7395643/1405588 – o11c Oct 25 '21 at 18:48
  • 2
    @Eugene Nah, a single `std::vector` is much better, maybe wrapped in a Matrix class – MatG Oct 25 '21 at 18:48
  • 1
    Please post a variant that doesn't work instead of one that works. (As far as I can see, that should finish in a matter of seconds.) – molbdnilo Oct 25 '21 at 19:03
  • Very simple, very effective example of MatG's point: https://stackoverflow.com/a/2076668 – user4581301 Oct 25 '21 at 19:10
  • Maybe [start from something that works](https://stackoverflow.com/a/69608341/8584929) rather than trying to reverse-engineer what’s wrong. Additionally, you may need [the Strassen algorithm](https://en.wikipedia.org/wiki/Strassen_algorithm) for very large data. Which has a [numerical stability issue](https://www.osti.gov/pages/servlets/purl/1356986), leading to even more fun and adventure. – Andrej Podzimek Oct 25 '21 at 19:47
  • When increased to 1000x1000, the program takes about 7.5 seconds on my 8 years old CPU. That's without optimisations. With -O2, about 1.5 seconds. Please provide some data about your environment. How you compile, what is your hardware, and how long it takes for several different sizes. – n. m. could be an AI Oct 25 '21 at 20:01

1 Answers1

0

Instead of working with arrays directly named as matrix, try something simple and scalable, then optimize. Something like this:

class matrix 
{ 
    private: 
    // sub-matrices 
    std::shared_ptr<matrix> c11;     
    std::shared_ptr<matrix> c12;     
    std::shared_ptr<matrix> c21;     
    std::shared_ptr<matrix> c22; 
 
    // properties 
    const int n; 
    const int depth; 
    const int maxDepth; 
 
    // this should be shared-ptr too. Too lazy. 
    int data[16]; // lowest level matrix = 4x4 without sub matrix 
 
 
    // multiplication memory 
    std::shared_ptr<std::vector<matrix>> m; 
     
    public: 
    matrix(const int nP=4,const int depthP=0,const int maxDepthP=1): 
        n(nP),depth(depthP),maxDepth(maxDepthP) 
    { 
        if(depth<maxDepth) 
        { 
            // allocate c11,c22,c21,c22 
            // allocate m1,m2,m3,...m7 
        } 
    } 
 
    // matrix-matrix multiplication 
    matrix operator * (const matrix & mat) 
    { 
        // allocate result 
 
        // multiply 
        if(depth!=maxDepth) 
        { 
            // Strassen's multiplication algorithm 
            *m[0] = (*c11 + *c22) * (*mat.c11 + *mat.c22); 
            ... 
            *m[6] = (*c12 - *c22) * (*mat.c21 + *mat.c22); 
 
            *c11 = *m[0] + *m[3] - *m[4] + *m[6]; 
            .. 
            *c22 = .. 
        } 
        else 
        { 
            // innermost submatrices (4x4) multiplied normally 
            result.data[0] = data[0]*mat.data[0] + .... 
            ... 
            result.data[15]= ... 
        } 
        return result; 
    } 
 
    // matrix-matrix adder 
    matrix operator + (const matrix & mat) 
    { 
        // allocate result 
 
        // add 
        if(depth!=maxDepth) 
        { 
            *result.c11 = *c11 + *mat.c11; 
            *result.c12 = *c12 + *mat.c12; 
            *result.c21 = *c21 + *mat.c21; 
            *result.c22 = *c22 + *mat.c22; 
        } 
        else 
        { 
            // innermost matrix 
            result.data[0] = ... 
        } 
        return result; 
    } 
}; 

This way, it costs less time-complexity and still looks simple to read. After it works, you can use single-block of matrix array inside of class to optimize for more speed, preferably only allocating once at root matrix and use

std::span 

for access from submatrices for newer C++ versions. It is even parallelizable easily as each matrix can distribute its work to at least 4 threads and they can to 16 threads, 64 threads, etc. But of course too many threads are just as bad as too many allocations and should be optimized in a better way.

huseyin tugrul buyukisik
  • 11,469
  • 4
  • 45
  • 97