You have several solutions, but generally what you need to do it output the indices of the non-zero elements as well as the values. Lets assume that you want to export to a single text file.
Generate array
Lets first generate a 10000 x 5000 sparse array with ~10% filled (it will be a bit less due to replicated indices):
N = 10000;
M = 5000;
rho = .1;
rN = ceil(sqrt(rho)*N);
rM = ceil(sqrt(rho)*M);
S = sparse(N, M);
S(randi(N, [rN 1]), randi(M, [rM 1])) = randi(255, rN, rM);
If your array is not stored as a sparse array, you can create it simply using (where M is the full array):
S = sparse(M);
Save as text file
Now we will save the matrix in the following format
row_indx col_indx value
row_indx col_indx value
row_indx col_indx value
This is done by extracting the row and column indices as well as data values and then saving it to a text file in a loop:
[n, m, s] = find(S);
fid = fopen('Sparse.txt', 'wt');
arrayfun(@(n, m, s) fprintf(fid, '%d\t%d\t%d\n', n, m, s), n, m, s);
fclose(fid);
If the underlying data is not an integer, then you can use the %f flag on the last output, e.g. (saved with 15 decimal places)
arrayfun(@(n, m, s) fprintf(fid, '%d\t%d\t%.15f\n', n, m, s), n, m, s);
Compare this to the full array:
fid = fopen('Full.txt', 'wt');
arrayfun(@(n) fprintf(fid, '%s\n', num2str(S(n, :))), (1:N).');
fclose(fid);
In this case, the sparse file is ~50MB and the full file ~170MB representing a factor of 3 efficiency. This is expected since I need to save 3 numbers for every nonzero element of the array, and ~10% of the array is filled, requiring ~30% as many numbers to be saved compared to the full array.
For floating point format, the saving is larger since the size of the indices compared to the floating point value is much smaller.
In Matlab, a quick way to extract the data would be to save the string given by:
mat2str(S)
This is essentially the same but wraps it in the sparse command for easy loading in Matlab - one would need to parse this in other languages to be able to read it in. The command tells you how to recreate the array, implying you may need to store the size of the matrix in the file as well (I recommend doing it in the first line since you can read this in and create the sparse matrix before parsing the rest of the file.
Save as binary file
A much more efficient method is to save as a binary file. Assuming the data and indices can be stored as unsigned 16 bit integers you can do the following:
[n, m, s] = find(S);
fid = fopen('Sparse.dat', 'w');
fwrite(fid, size(S), 'uint16');
fwrite(fid, [n m s], 'uint16');
fclose(fid);
Then to read the data:
fid = fopen('Sparse.dat', 'r');
sz = fread(fid, 2, 'uint16');
s = reshape(fread(fid, 'uint16'), [], 3);
s = sparse(s(:, 1), s(:, 2), s(:, 3), sz(1), sz(2));
fclose(fid);
Now we can check they are equal:
isequal(S, s)
Saving the full array:
fid = fopen('Full.dat', 'w');
fwrite(fid, full(S), 'uint16');
fclose(fid);
Comparing the sparse and full file sizes I get 21MB and 95MB.
A couple of notes:
- Using a single write/read command is much (much much) quicker than looping, so the last method is by far the fastest, and also most space efficient.
- The maximum index/data value size that can be saved as a binary integer is 2^n - 1, where n is the bitdepth. In my example of 16 bits (uint16), that corresponds to a range of 0..65,535. By the sounds of it, you may need to use 32 bits or even 64 bits just to store the indices.
- Higher efficiency can be obtained by saving the indices as one data type (e.g. uint32) and the actual values as another (e.g. uint8). However, this adds additional complexity in the saving and reading.
- You will still want to store the matrix size first, as I showed in the binary example.
- You can store the values as doubles if required, but indices should always be integers. Again, extra complexity, but doable.