0

I have n mat-files containing parts of a single huge matrix. I want to load the mat-files and concatenate the row,column,value-vectors to build matrix W. Right now, I'm doing the following, but it's really slow and I know I shouldn't dynamically increase the size of rows,cols,vals:

rows=[];cols=[];vals=[];

for ii=1:n
    var = load( sprintf('w%d.mat', ii) );
    [r,c,v] = find( var.w );

    rows = [rows; r];
    cols = [cols; c];
    vals = [vals; v];
end

W = sparse( rows, cols, vals );

Do you know a better way? Thanks in advance!

Solution

Based on Daniel R's suggestion, I solved it like this:

rows=zeros(maxSize,1);
cols=zeros(maxSize,1);
vals=zeros(maxSize,1);
idx = 1;

for ii=1:n
    var = load( sprintf('w%d.mat', ii) );
    [r,c,v] = find( var.w );        

    len = size(r,1);
    rows(idx:(idx-1+len))=r;
    cols(idx:(idx-1+len))=c;
    vals(idx:(idx-1+len))=v;
    idx = idx+len;
end

W = sparse( rows, cols, vals );

Extremely fast, thanks a lot!

Lisa
  • 3,365
  • 3
  • 19
  • 30
  • actually the code is not valid matlab. it should be `for ii = 1:n` – bdecaf Jan 14 '14 at 11:59
  • Please give the profile results of which lines take most time. Also give the relevant sizes of variables in those lines and check how often these lines are called. – Dennis Jaheruddin Jan 14 '14 at 12:24
  • The `rows=[rows;r];` commands were my problem. I solved it and will update my post accordingly! – Lisa Jan 14 '14 at 12:32
  • relates to http://stackoverflow.com/questions/21023171/variable-appears-to-change-size-on-every-loop-iteration-what – Shai Jan 14 '14 at 12:41

3 Answers3

1

You need to preallocate an array. Assuming r,c and v have the size a*b each.

In total, you need a*n rows and b columns, thus preallocate using rows=nan(a*n,b)

To write the Data into the array, you have to set the correct indices: rows((ii-1)*a+1:ii*a,1:end)=r

Daniel
  • 36,610
  • 3
  • 36
  • 69
  • 1
    The problem is I don't know the dimension of `r`,`c`,`v`. However, I guess this could work anyway, by allocating `rows`,`cols`,`vals` at a maximum size and determining the size of each sub-matrix on the fly. Thanks already, I'll try this and get back to you! – Lisa Jan 14 '14 at 11:51
0

A few ideas.

  • It looks to me the matrices have the same sizes.
  • Another bottleneck might be the find function.
  • could it happen that two matrices have different values at the same index? That might lead to problems.

You could do:

 var = load( 'w1.mat' );
 W = var.w;
 for ii  = 2:n
    var = load( sprintf('w%d.mat', ii) );
    ig = var.w~=0;
    W(ig) = var.w(ig);
 end

in case var.w is not sparse add a W = sparse(W)

bdecaf
  • 4,652
  • 23
  • 44
  • Thanks for your answer. To your remarks: 1. It's true that the matrices have the same size. 2. The find function is very fast and normally not a bottleneck. 3. Good catch, but fortunately this can't happen in my case. Your solution works, but write access to sparse matrices in that way is very very slow and should be avoided because of the way sparse matrices are organised (more information [here](http://www.mathworks.co.uk/help/matlab/math/accessing-sparse-matrices.html) if you're interested). – Lisa Jan 14 '14 at 12:10
  • I see - is your matrix small enough that you can have it in memory in full form? Then it might be faster. – bdecaf Jan 14 '14 at 12:17
  • No, that's not possible unfortunately. I found a solution in the meantime, thanks for your inputs, though! – Lisa Jan 14 '14 at 12:39
0

Here is a trick that sometimes helps to speed things up as if your variables were initialized. However, sometimes it just does not help so the only way to find out is to try:

Replace lines like:

rows = [rows; r];

by lines like:

rows(end+(1:numel(r))) = r;
Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122