0

I have a data set which is very large, thousands of rows and hundreds of column. I try to alternately reshape the data for every nth row, and all the nth row column data. I tried like this:

in=rand(71760,320);
m=240; n=320;
[R,C]=size(in); 
out=[];
R_out=R/m; 

for k=1:m %from row 1 to mth row
    for i=1:C %reshape every column of mth row
        out=[out;reshape(in(k:m:end,i),R_out,1)'];
    end
end

If you try out the code, it took very long time and not efficient at all, you won't even bother to let it finish. How to increase the performance? Or there are better way to do it?

UPDATE

This question was extended to another thread here so as to improve the performance of reshaping answer provided by @Teddy

Community
  • 1
  • 1
Gregor Isack
  • 1,111
  • 12
  • 25
  • Could it be you mean `for k=1:m` rather than `for k=1:n` ? since your striding by `m` in the rows (and `n` is the columns). Could you clarify the expected size of `out`? – Teddy Ort Mar 22 '17 at 06:02
  • Ah sorry my mistake. Edited. The expected output size would be (m x C, R/m), for the case above, it would be (76800, 299) – Gregor Isack Mar 22 '17 at 06:08

1 Answers1

3

The reason it takes so long is that the out matrix should be preallocated.

For example this completed in about 1 second on my laptop:

in=rand(71760,320);
m=240; n=320;
[R,C]=size(in); 
R_out=R/m; 

out=zeros(m*C,R_out);
for k=1:m %from row 1 to nth row
    for i=1:C %reshape every column of nth row
        out(i+C*(k-1),:) = in(k:m:end,i)';
    end
end

Alternative method

The best practice would be to use a vectorized approach using arrayfun which could be done in a single line like this:

out=cell2mat(arrayfun(@(k) in(k:m:end,:)', 1:m,'uniformoutput',0)');

this also runs more quickly.

Teddy Ort
  • 756
  • 6
  • 7
  • It did took seconds for it to finish compared to my original code. Thanks for the help! – Gregor Isack Mar 22 '17 at 06:22
  • 1
    Though the one-liner is aesthetically nicer, it is rarely faster unless of course GPUs are used. See e.g. http://stackoverflow.com/questions/12522888/arrayfun-can-be-significantly-slower-than-an-explicit-loop-in-matlab-why – Nicky Mattsson Mar 22 '17 at 08:43
  • 1
    The vectorization I was referring to was for the assignment, which gives a substantial performance improvement over the original looping assignment. However, I agree it would be even faster if the `arrayfun` call was replaced by a loop with the vectorized assignment inside it. – Teddy Ort Mar 22 '17 at 15:26
  • Tested the vectorized method, it did improved the performance significantly. However, if the data getting bigger, both loop and vectorized method can't handle it. The size I'm talking here is something like: `in=rand(291081,1920);` where `m=581;`. Any suggestion to deal with this kind of issue? If possible without getting involve with GPUs. Thanks! – Gregor Isack Apr 01 '17 at 06:26
  • @JDane as Nicky mentioned above, you could get a slight improvement over both methods by using the vectorized assignment of the second, with an explicit for loop instead of arrayfun/cell2mat since those introduce a small overhead. However, this won't make a very large difference. Also, since this is a different issue than you're original problem (preallocation), you might want to post a new question asking specifically for ways to improve upon the methods in this answer. Maybe someone will know of a faster way. If you do, add a link here as well. – Teddy Ort Apr 01 '17 at 23:47
  • I did posted another question, do have a look at the update. Thanks! – Gregor Isack Apr 02 '17 at 15:21