0

Dear stackoverflow community:

I'm pretty new to BOOST, and I'm trying to use its uBLAS library's sparse matrix. One problem that I encounter is that after computing my sparse matrix, I want to store it to a binary file, and read the file from some other programs and recover the sparse matrix. Normally I write files in the following way (say if I have an array A with 100 floats):

std::ofstream ofsDat("file.dat", std::ofstream::out); 
ofsDat.write((char*)A, sizeof(float)*100);
ofsDat.close(); 

I'm wondering if I can do similar write operation for a BOOST sparse matrix? If so, what should be the second argument of ofstream::write? (Should be the size of the data chunk)

YangLou
  • 1
  • 2

2 Answers2

0

The Boost Sparse Matrix only stores non zero values. There is no &A to a dense representation of the matrix. If you need a binary output, you will have to construct that yourself on the fly. You don't need ios::out, that is understood. You should

std::ofstream test( "./file.dat", std::ios::binary );

or ofstream could/will mangle your output by treating it as a character stream. Then:

#include <iostream>
#include <fstream>
#include <boost/numeric/ublas/matrix_sparse.hpp>
#include <boost/numeric/ublas/storage.hpp>

namespace ublas = boost::numeric::ublas;

int main( )
{
    size_t width= 10;
    size_t depth= 10;
    ublas::compressed_matrix< double > m( width, depth );
    m(0, 0) = 9;
    m(1, 0) = 2;
    m(0, 1) = 3;
    m(5, 5) = 7;

    std::ofstream test( "./file.dat", std::ios::binary );
    double zero= 0.0;
    for( int i=0; i < width; ++i )
        for (int j=0; j< depth; ++j )
        {
            double* temp= m.find_element( i, j );
            if( temp )
                test.write( (char*)temp, sizeof( double ) );
            else
                test.write( (char*)&zero, sizeof( double ) );
        }
    test.close( );
}

But without writing more information to the file, the matrix must be a known width and depth. I would really look into the input on the other end and see if it can't be done with a text file. Then you only have to:

test << m;
lakeweb
  • 1,859
  • 2
  • 16
  • 21
  • Thank you so much for your reply! I already gave up my old way of thinking this "read-write" can be done in a manner that directly manage chunks of memory. So I have been thinking of doing an "element-wise read/write". Your example does exactly this. My only concern is, for a sparse 2D matrix (NxN) with M non-zeros, the complexity of doing two loops is N^2, but this write operation really only needs M accesses to the non-zero elements. So is there any way to iterate over all non-zero elements efficiently? – YangLou Dec 22 '16 at 04:03
  • I found this link [link](http://stackoverflow.com/questions/1795658/looping-over-the-non-zero-elements-of-a-ublas-sparse-matrix), it kind of says I need to iterate over all elements (non-zero and zeros) in order to go through all the non-zeros? Or am I missing something here? – YangLou Dec 22 '16 at 04:05
  • Hi @YangLou . I have not check with a profiler but my guess is that because you are writing a file and filling zeros, getting the data is not so important a bottle neck. I believe fast, with a sparse matrix, is more applicable to calculations on the matrix. – lakeweb Dec 22 '16 at 15:51
0

Finally after some search and trials, I found a way to do this writing and reading sparse matrix. Note that my task is actually relatively simple, so for some more complicated and more general purpose, I do not know if this crude method will work or not.

The basic idea is write into ofstream, by iterating over all non-zero elements in the boost's sparse matrix, via a const_iterator (see this link for more details). And when read from ifstream, I am using a poor man's method: iteratively read in according to the writing format, and insert into the sparse matrix. Here is my code for my test purpose:

#include <iostream>
#include <fstream>
#include <boost/numeric/ublas/matrix_sparse.hpp>
#include <boost/numeric/ublas/io.hpp>


    int main(int argc, char** argv)
    {
        using std::cerr;
        using std::cout; 
        using std::endl;
        using namespace boost::numeric::ublas;
        typedef compressed_matrix<float, row_major> cMatrix;

        const size_t size = 5;
      const size_t rowInd[5] = { 0, 0, 1, 2, 4 };
      const size_t colInd[5] = { 0, 2, 0, 4, 4 };

        cMatrix sparseMat(size,size);
      for (size_t i=0; i<size; ++i) 
            sparseMat.insert_element(rowInd[i], colInd[i], 1.0);

        cout << sparseMat << endl;

        // Try writing to file
        std::ofstream ofsDat("temp.dat", std::ios::out | std::ios::binary);
        for(cMatrix::const_iterator1 rowIter = sparseMat.begin1(); rowIter != sparseMat.end1(); ++rowIter)  {
            for(cMatrix::const_iterator2 colIter = rowIter.begin(); colIter != rowIter.end(); ++colIter)    {
                ofsDat << " " << colIter.index1() << " " << colIter.index2() << " " << *colIter;
            }       // end for colIter
        }       // end for rowIter
        ofsDat.close();

        cout << "Writing ended, starting to read" << endl;

        // Try reading the file
        cMatrix sparseMat_2(size, size);
        std::ifstream ifsDat("temp.dat", std::ios::in | std::ios::binary);
        size_t rowTemp, colTemp; 
        float valTemp;
        while(!ifsDat.eof())    {
            ifsDat >> rowTemp >> colTemp >> valTemp;
            cout << "row " << rowTemp << " column " << colTemp << " value " << valTemp << endl;
            sparseMat_2.insert_element(rowTemp, colTemp, valTemp);
        }

        cout << sparseMat_2 << endl;

        return 0;
    }

I added a space in between the data as separators, I don't know if there's a better, or standard way to do this? Any feedback will be appreciated!

YangLou
  • 1
  • 2
  • Hi YangLou. Your matrices must be very large and very sparse to look for a file system solution that is faster. And unless they are very, very sparse, you will not gain much here, especially if you use a text stream rather than binary. – lakeweb Dec 22 '16 at 16:52
  • @lakeweb The matrix is indeed somehow large (65536 by 65536), and the non-zero element number is around 20k. I changed the io format to binary, do you think that will accelerate the io speed? I did not know that by default a text stream will be used... – YangLou Dec 22 '16 at 22:31
  • Hi YangLou. No, `ios::binary` doesn't change the use of the `<<' operator. your numbers are still being converted to ASCII as they are streamed out. In my first example they are written as binary, I'm not using the streaming operators. Are you trying to stream so much data, so intensively, between applications that you have to optimize? If so, using the file system my be the wrong approach. I'm temped to run the profiler as I'm pretty sure anything you do at this level will have little effect on the overall hit you are getting from using the file system. – lakeweb Dec 22 '16 at 23:55