C++: How to implement sparse matrices with very large indices?

Question

I am trying to implement the dynamic programming presented in this article to solve the Shortest Hamiltonian Path problem. This solution requires storing values in a 2d array called DP of size nx2^n where n is the number of nodes of the graph.

My graph has more than 100 nodes, but it is very sparse, so most of the elements of the matrix DP is +infinity. Therefore I can store it using a sparse matrix library (and see zero elements as +infinity). For example, using Eigen:

Eigen::SparseMatrix<double> M(n, 1 << n);
M.reserve(5e8); // reserve 4GB of RAM since I'm not able to estimate the number of non-zeros

However, the maximum index of this matrix is 2^n - 1 (in C++ code 2^n is 1 << n), which is far larger than what is allowed for an index (if n = 100 for example). The operation 1 << n is also no longer valid for very large n as well (I tried cout<<(1 << n)<<endl; and got 262144 as result for n = 114).

So, while the number of non-infinity elements of my matrix can fit into memory, I don't know how to deal with the indices.

I would appreciate a lot if somebody can help me to solve this problem. Thank you so much in advance!

The [`SparseMatrix`](https://eigen.tuxfamily.org/dox/classEigen_1_1SparseMatrix.html) template has three template parameters, the third of which is the storage index type (by default `int`). You could just set it to `long`. — Henri Menke, Dec 08 '17 at 22:52
@HenriMenke OP used 2^100 as example of possible maximum index, which is way beyond the range of `long` of any platform I know. — Matteo Italia, Dec 08 '17 at 23:00
@HenriMenke But according to this page: http://www.cplusplus.com/reference/climits/, `long` has the same max value as `int`. Even if it's larger I think it's still not enough for 2^100 :( Another problem when `n` is too large is that the operation `1 << n` is no longer valid. I tried `cout<<(1 << n)< — f10w, Dec 08 '17 at 23:00
@Khue Note the "or greater", which applies on Linux and osx. Anyways, "greater" still is just 2^64. — Baum mit Augen, Dec 08 '17 at 23:04
That said, as usually, you can use boost. http://www.boost.org/doc/libs/1_65_1/libs/multiprecision/doc/html/boost_multiprecision/tut/ints/cpp_int.html — Baum mit Augen, Dec 08 '17 at 23:04
What about using a map instead of a sparse matrix? Any key without a value could be assumed to be infinity — idontseethepoint, Dec 08 '17 at 23:12
@BaummitAugen Thanks, but I'm not sure boost multi-precision can play with Eigen (or other) sparse matrix. — f10w, Dec 08 '17 at 23:17
Dunno about Eigen, but in general, there's no reason why one shouldn't be able to use boost integers as indices in a sparse map. — Baum mit Augen, Dec 08 '17 at 23:20
@idontseethepoint Thanks. That is indeed an option. But I guess element access in map is not very efficient? — f10w, Dec 08 '17 at 23:20
@BaummitAugen I see the idea, thanks. If using map, do you think using boost integers as keys better than using just double? — f10w, Dec 08 '17 at 23:26
@Khue Definitely. After 2^53 or so, `double` starts rounding integers to other integers. That's bad when it's supposed to be an index. However, you are right that a map as in `std::map` will most likely be slow, probably even very much so. — Baum mit Augen, Dec 08 '17 at 23:29
For the keys you might make a new class that holds a vector of integers. Supposing the integers are 64 bit, the first value would be the first 64 bits of the number, the second int would be the second 64 bits and so on. std::unordered_map will probably be faster than std::map — idontseethepoint, Dec 08 '17 at 23:36
@idontseethepoint Your first advice basically boils down to reimplementing some big integer type, which you shouldn't. There's even a decent chance boost will use some platform specific built in type, which is not beatable in software. For the second part: No, neither is a good implementation of a sparse matrix. — Baum mit Augen, Dec 08 '17 at 23:40
That depends on the relative amount of time that will be spent altering vs. reading the matrix, a map is efficient for changing the matrix — idontseethepoint, Dec 08 '17 at 23:45
@idontseethepoint Even then, measure. The poor data locality of those containers really tends to mess stuff up performancewise. — Baum mit Augen, Dec 08 '17 at 23:53
@BaummitAugen Your comments are very valuable, thanks. What I need is a structure that can help me to efficiently insert new elements on the fly and at the same time check if certain elements exist. I'm thinking about this: instead of using a 2d structure, I use a vector with elements DP[i + N*s] where i go from 0 to n-1 and s go from 0 to 2^n - 1. I define another vector of indices `indices[]` for storing the indices of the non-infinity elements of `DP`. The values of indices are ordered, so finding an index takes O(log(L))... — f10w, Dec 08 '17 at 23:58
Well after typing all of these I realized that it's just to naive. STL map should already be more efficient than that. — f10w, Dec 08 '17 at 23:58
If inserting and checking for existing elements rather than iterating is the bulk of the work, the two standard map types are indeed worth a try. If one of them is fast enough, that's probably the easiest way to go. — Baum mit Augen, Dec 09 '17 at 00:02
@BaummitAugen Thanks a lot! I'll get back about the performance. — f10w, Dec 09 '17 at 00:07
(Tbh, I kinda regret talking about performance that much. Code for correctness first, optimize later. Premature optimization is the root of all evil.) — Baum mit Augen, Dec 09 '17 at 00:44
@BaummitAugen For me that was a good thing (at least for my problem) ;) A good analysis at the beginning will lead to the right direction. — f10w, Dec 09 '17 at 00:49
@BaummitAugen Please help https://stackoverflow.com/questions/47731764/c-using-pair-of-cpp-int-int-integers-as-key-in-unordered-map-where-cpp-in Thank you so much!! — f10w, Dec 09 '17 at 18:27

C++: How to implement sparse matrices with very large indices?

0 Answers0

Linked