How do I store a 100000*100000 matrix in C++?

Question

I have two vectors a[100000] and b[100000]. I want to store a[i]*b[j] in a 100000 x 100000 matrix M. How can I do it in C++?

If you have enough memory use [`std::vector`](http://en.cppreference.com/w/cpp/container/vector) of [`std::vector`](http://en.cppreference.com/w/cpp/container/vector), otherwise you could use a [memory mapped file](https://en.wikipedia.org/wiki/Memory-mapped_file). — Some programmer dude, Feb 08 '16 at 09:56
No i mean suppose I'm making a matrix T order n*n where each element A(i,j)=A(i)*B(j) — Sujay_K, Feb 08 '16 at 09:57
There are also many libraries which can handle [sparse matrixes](https://en.wikipedia.org/wiki/Sparse_matrix) in a space-efficient way. — Some programmer dude, Feb 08 '16 at 09:58
@Joachim Pileborg No i have to store it while runtime from the user — Sujay_K, Feb 08 '16 at 10:01
So? Memory mapped files are still a valid way to store large amounts of run-time data (I'm guessing that's what you meant). Nothing says that the file has to be kept once you're done with the data. And sparse matrices or (if you have enough memory) a vector of vectors, or even a single vector are all still other valid possibilities. — Some programmer dude, Feb 08 '16 at 10:11
@HumamHelfawi: `a*b` is usually interpreted as the inproduct, a single scalar result created by summing `a[i] * b[i]`. — MSalters, Feb 08 '16 at 11:44

score 3 · Answer 1 · edited Feb 08 '16 at 11:42

3

You can use a std::vector<std::vector<your_type>> to store the result.

int rows = 100000, cols = 100000;

std::vector<std::vector<double>> result;
result.resize(rows);

for(int i=0; i<rows; i++) {
        result[i].resize(cols);
}    

for(int i=0; i<rows; i++) {
    for(int j=0; j<cols; j++){
        result[i][j] = a[i] * b[j];
    }
}

Or you can use a linear algebra library, for example Eigen (you'll may have less to code with this), which will surely be more efficient.

edited Feb 08 '16 at 11:42

Sergey Krivohatskiy

190
1
11

answered Feb 08 '16 at 09:56

Pierre

1,162
12
29

3

This is highly inefficient. It has terrible memory locality and 100,000 unnecessary dynamic allocations. Prefer a single, flat vector of size `rows*cols` and overlay a 2D indexing scheme at the interface layer. – Lightness Races in Orbit Feb 08 '16 at 10:56
1

At least the lack of `.reserve()` in each child vector you can (and should!) fix quite easily. – Lightness Races in Orbit Feb 08 '16 at 10:57
1

The "inefficiency" is in the order of 40 bytes per MB of data. (0.004%). In exchange, you don't need 40 GB of contiguous allocation. – MSalters Feb 08 '16 at 11:40
Or just create it in one go: `std::vector> result { rows, std::vector(cols) };`. – MSalters Feb 08 '16 at 11:46
@MSalters: It's not just inefficient in memory volume required. There is an unnecessary performance problem at "initialisation" as well as an unnecessary performance problem at usage time, both of which I've addressed already. To me, this doesn't fall into the category of "premature optimisation; ignore it", but into the category of "there is literally no reason to use this naive and inherently slow technique". – Lightness Races in Orbit Feb 08 '16 at 17:30
Performance problem at initialization? No, filling 40 GB will be bandwidth-limited either way. Performance at usage time? Even less impact, as literally 99.999% of accesses are still linear. – MSalters Feb 09 '16 at 08:13

score 2 · Answer 2 · edited May 23 '17 at 12:06

2

The NON-contiguity part of this answer should be re-researched. It may be wrong.

If you want to work with big number of element such as 100000*100000. I would not recommend to use vector of vectors due to the "NON-contiguity" property of the inner vectors elements to each other. Small push_back may results in a lot of mess.

I would use single vector with a wrapper. See this for more information Clean ways to write multiple 'for' loops .

edited May 23 '17 at 12:06

Community

1
1

answered Feb 08 '16 at 10:02

Humam Helfawi

19,566
15
85
160

1

Why do you think non-contiguity is bad? It's probably useful here. – MSalters Feb 08 '16 at 11:41
mmmm to be honest with you I did not have a deep thought of it.. it is something I used to hear always. Anyway, I can think of some reasons. Like, you can not apply something like std::max_elemnt for example in one batch .. you have to go for each row separately. mmm I am rethinking. it is not about contiguity because you can do this for std::list it is about iterators.. mmmm again, I do not know! – Humam Helfawi Feb 08 '16 at 11:45
1

Contiguity helps if you can keep elements which are used together on the same page (within 4KB). But elements in adjacent rows are separated by 400 kB in the contiguous case. And if you want the maximum element of a 40 GB matrix, you'd better check `std::thread::hardware_concurrency` how many threads you should be creating for that. Then divide the 100.000 rows over your threads, and call `std::max_element` on each row. – MSalters Feb 08 '16 at 11:52
Many thanks. I am going to read more about the method you mentioned – Humam Helfawi Feb 08 '16 at 12:12

user3528438 · Answer 3 · 2016-02-08T19:05:14.870

    #include <vector>
    class C
    {
    public:
        C(const std::vector<double>& a_, const std::vector<double>& b_)
            :a(a_),b(b_){};
        double operator()(size_t i, size_t j) const { return a[i]*b[j]; }
    private:
         std::vector<double> a, b;
    };

What the problem actually is?

The original question asks about a way to save C(i,j)=A(i)*B(j) to a matrix.

From an OOP point of view, such an matrix can be defined as an object with a method takes two inputs(i and j), and returns a result (ret=A(i)*B(j)).

This could be implemented using nested array subscriptions(c[i][j]), or linear array indexing(c[i*100000+j]), or a function (c.get(i, j)). The third way could also be simplified to a functor (c.operator()(i, j) or c(i, j)).

Then what?

If you agree to all above that any of the three interfaces serves the purpose, or at least partially (like I mentioned in the comment, if the matrix is only required to provide random read access to its elements). Then we continue to implement one of them, 3rd one being my choice.

Why do it that way?

My observation is that, computing the return value is not expensive, so why not calculate the product "lazily" when the product is actually accessed?

In this way, the storage is very efficient (memory usage is reduced from n^2 to 2n).

Hiding the multiplication in the getter function does not significantly increase access time (two memory accesses and one multiplication, compared with the ideal case being one memory access only, but both cases are constant time, and this implementation is much more cache friendly for the reduced size of the data).

So, instead of saving the product, just save the inputs, but calculate the product when a particular element is accessed.

What is missing?

Although manipulating this "matrix" is possible ( by changing the member a and b), it does not allow changing arbitrary element to arbitrary value.

Member functions that implements array slicing (like c(0:10:end, 4)) is not present either, but is feasible.

Test code

int main() {
    C c({1,2,3,4},{10,20,30,40}); // a={1,2,3,4}; b={10,20,30,40}
    cout << "3*30 "<<c(2,2);      // c(2, 2) = a[2]*b[2] = 3*30 = 90
    return 0;
}

Demo

http://ideone.com/bZR7AU

Well, I think this answer deserves the donwvote because the class is not really a matrix (it lacks the random write-access, like you can not add/multiply another arbitrary matrix of a compatible size to it and store the result back). But if you only read from this matrix then I don't think this is a bad solution. IMHO the question is itself an x-y problem. — user3528438, Feb 08 '16 at 15:05
It doesn't really answer the question, and there is a lack of explanation — Pierre, Feb 08 '16 at 18:24
I am not sure about the answer. Most likely calculating `A[i,j] = B[i] * C[j]` was only a step in a much bigger task. — Kijewski, Feb 08 '16 at 23:10

score 0 · Answer 4 · answered Feb 08 '16 at 12:14

Using an in-RAM std::vector<double> would probably not feasible if you have less than 80GB of RAM in your system (for a 100000×100000 matrix of doubles).

Here is how you would do this using an mmap'd file. Please see the inline comments:

#include <sys/mman.h>
#include <stddef.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>

#define ROWS 1000
#define COLS 1000
#define FILENAME "./matrix.doubles"

int main(void)
{
    double (*matrix)[ROWS][COLS]; // pointer to our matrix
    int fd; // file descriptor of backing file

    // open backing file
    fd = open(FILENAME,
              O_CREAT | O_RDWR, // create (if absent) and/or read and writable
              S_IRUSR | S_IWUSR); // (only) user may read and write
    if (fd < 0) {
        perror("Could not open file");
        return 1;
    }

    if ((lseek(fd, sizeof(*matrix), SEEK_SET) == (off_t) -1) ||
        ftruncate(fd, sizeof(*matrix)) ||
        (lseek(fd, 0, SEEK_SET) == (off_t) -1)) {
        perror("Could not set file size.");
        return 1;
    }

    matrix = mmap(NULL, // I don't care were the address starts
                  sizeof(*matrix), // size of matrix in bytes
                  PROT_READ | PROT_WRITE, // readable and writable
                  MAP_PRIVATE, // we access the data exclusively
                  fd, // file descriptor of backing file
                  0); // offset
    if (matrix == MAP_FAILED) {
        perror("Could not mmap file.");
        return 1;
    }

    // operate on matrix
    for (unsigned row = 0; row < ROWS; ++row) {
        for (unsigned col = 0; col < COLS; ++col) {
            (*matrix)[row][col] = row * col;
        }
    }

    // close backing file
    munmap(matrix, sizeof(*matrix));
    close(fd);

    return 0;
}

This is pure C code. You could fancy up the code by using e.g. std::array<double, ROWS, COLS>& instead of a bare array, but I think the idea should be understandable.

score 0 · Answer 5 · answered Mar 15 '16 at 20:49

If you can compute a[i]*b[j] on the fly then you should do that because of two reasons:

Obtaining the results from a huge matrix may not be faster than compute the product of two double values on the fly.
10000x10000 double matrix requires 80 Gbytes of storage (in-memory or on disk) and some extra work might be needed to access pre-computed data.

In my example below, I see 30x speed up (compiled in release mode using clang 3.8) if I compute the product of two double values on the fly for N=20000.

template <typename T>
void test_lookup(std::vector<T> &data, std::vector<size_t> &index,
                 std::vector<T> &results) {
    const size_t LOOP = index.size() / 2;
    for (size_t idx = 0; idx < LOOP; ++idx) {
        auto row = index[2 * idx];
        auto col = index[2 * idx + 1];
        results[idx] = data[col * LOOP + row];
    }
}

template <typename T>
void test_mul(std::vector<T> &x, std::vector<T> &y, std::vector<T> &results) {
    for (size_t idx = 0; idx < x.size(); ++idx) {
        results[idx] = x[idx] * y[idx];
    }
}

How do I store a 100000*100000 matrix in C++?

5 Answers5