3

I have a cell containing a random number of matrices, say a = {[300*20],....,[300*20]};. I have another cell of the same format, call it b, that contains the logicals of the position of the nan terms in a.

I want to use cellfun to loop through the cell and basically let the nan terms equal to 0 i.e. a(b)=0.

Thanks, j

Eitan T
  • 32,660
  • 14
  • 72
  • 109
frage
  • 719
  • 7
  • 15

3 Answers3

3

You could define a function that replaces any NaN with zero.

function a = nan2zero(a)
  a(isnan(a)) = 0;

Then you can use cellfun to apply this function to your cell array.

a0 = cellfun(@nan2zero, a, 'UniformOutput', 0)

That way, you don't even need any matrices b.

s.bandara
  • 5,636
  • 1
  • 21
  • 36
  • Assuming the matrices in `b` already exist, it would be more efficient to use them as a second input to your function. Admittedly the speed difference will be minimal except when `a` and `b` are *really* large :-) But yeah, +1 – Colin T Bowers Jan 15 '13 at 04:34
  • Not even sure about that. Testing for *NaN* should be much faster than passing `b` around. – s.bandara Jan 15 '13 at 04:35
  • Passing `b` in should be faster as long as nothing is assigned to `b` inside the function. Matlab will treat `b` "By Ref" as long as you don't alter it. – Colin T Bowers Jan 15 '13 at 04:37
  • Actually the more I think about it, the less convinced I am by my own argument. I might run a couple of tests :-) If I get round to it, I'll post the results here. – Colin T Bowers Jan 15 '13 at 04:39
  • The by reference argument sounds pretty good. I was thinking about testing, too. I don't think there exists a means to have `cellfun` process two cell arrays, though, and combining each matrix in `a` and `b` would be worse. Let me know what you get. If MATLAB optimizes the `a(isnan(a)) = 0` as a whole, `b` will look quite sad I would bet. – s.bandara Jan 15 '13 at 04:42
  • Done some tests, posted them in a new answer since it was too much for a comment. Cheers! – Colin T Bowers Jan 15 '13 at 05:07
2

First, you should probably give the tick to @s.bandara, as that was the first correct answer and it used cellfun (as you requested). Do NOT give it to this answer. The purpose of this answer is to provide some additional analysis.

I thought I'd look into the efficiency of some of the possible approaches to this problem.

The first approach is the one advocated by @s.bandara.

The second approach is similar to the one advocated by @s.bandara, but it uses b to convert nan to 0, rather than using isnan. In theory, this method may be faster, since nothing is assigned to b inside the function, so it should be treated "By Ref".

The third approach uses a loop to get around using cellfun, since cellfun is often slower than an explicit loop

The results of a quick speed test are:

Elapsed time is 3.882972 seconds. %# First approach (a, isnan, and cellfun, eg @s.bandara)
Elapsed time is 3.391190 seconds. %# Second approach (a, b, and cellfun)
Elapsed time is 3.041992 seconds. %# Third approach (loop-based solution)

In other words, there are (small) savings to be made by passing b in rather than using isnan. And there are further (small) savings to be made by using a loop rather than cellfun. But I wouldn't lose sleep over it. Remember, the results of any simulation are specific to the specified inputs.

Note, these results were consistent across several runs, I used tic and toc to do this, albeit with many loops over each method. If I wanted to be really thorough, I should use timeit from FEX. If anyone is interested, the code for the three methods follows:

%# Build some example matrices
T = 1000; N = 100; Q = 50; M = 100;
a = cell(1, Q); b = cell(1, Q);
for q = 1:Q
    a{q} = randn(T, N);
    b{q} = logical(randi(2, T, N) - 1);
    a{q}(b{q}) = nan;
end

%# Solution using a, isnan, and cellfun (@s.bandara solution)
tic
for m = 1:M
    Soln2 = cellfun(@f1, a, 'UniformOutput', 0);
end
toc

%# Solution using a, b, and cellfun
tic
for m = 1:M
    Soln1 = cellfun(@f2, a, b, 'UniformOutput', 0);
end
toc


%# Solution using a loop to avoid cellfun
tic
for m = 1:M
    Soln3 = cell(1, Q);
    for q = 1:Q
        Soln3{q} = a{q};
        Soln3{q}(b{q}) = 0;
    end
end
toc

%# Solution proposed by @EitanT
[K, N] = size(a{1});
tic
for m = 1:M
    a0 = [a{:}];       %// Concatenate matrices along the 2nd dimension
    a0(isnan(a0)) = 0; %// Replace NaNs with zeroes    
    Soln4 = mat2cell(a0, K, N * ones(size(a)));
end
toc

where:

function x1 = f1(x1)
x1(isnan(x1)) = 0;

and:

function x1 = f2(x1, x2)
x1(x2) = 0;

UPDATE: A fourth approach has been suggested by @EitanT. This approach concatenates the cell array of matrices into one large matrix, performs the operation on the large matrix, then optionally converts it back to a cell array. I have added the code for this procedure to my testing routine above. For the inputs specified in my testing code, ie T = 1000, N = 100, Q = 50, and M = 100, the timed run is as follows:

Elapsed time is 3.916690 seconds. %# @s.bandara
Elapsed time is 3.362319 seconds. %# a, b, and cellfun
Elapsed time is 2.906029 seconds. %# loop-based solution
Elapsed time is 4.986837 seconds. %# @EitanT

I was somewhat surprised by this as I thought the approach of @EitanT would yield the best results. On paper, it seems extremely sensible. Note, we can of course mess around with the input parameters to find specific settings that advantage different solutions. For example, if the matrices are small, but the number of them is large, then the approach of @EitanT does well, eg T = 10, N = 5, Q = 500, and M = 100 yields:

Elapsed time is 0.362377 seconds. %# @s.bandara
Elapsed time is 0.299595 seconds. %# a, b, and cellfun
Elapsed time is 0.352112 seconds. %# loop-based solution
Elapsed time is 0.030150 seconds. %# @EitanT

Here the approach of @EitanT dominates.

For the scale of the problem indicated by the OP, I found that the loop based solution usually had the best performance. However, for some Q, eg Q = 5, the solution of @EitanT managed to edge ahead.

Community
  • 1
  • 1
Colin T Bowers
  • 18,106
  • 8
  • 61
  • 89
  • Looks like yours is faster, dude. +1. – s.bandara Jan 15 '13 at 05:44
  • @s.bandara *Slightly* faster using a non-rigorous timing methodology. But yeah, I'll claim it :-) – Colin T Bowers Jan 15 '13 at 05:53
  • @ColinTBowers +1 for the analysis. Can you time my suggested solution as well? (I'm not near MATLAB at the moment) :) – Eitan T Jan 15 '13 at 08:42
  • 1
    @EitanT Done! I've added an update to my answer. But I don't understand the results. When you get a chance, I'd be interested in hearing if you get similar numbers on your machine, or if you can spot a problem with my code. The results simply do not make any sense to me :-) ps +1 for your answer. On paper, it seems extremely sensible. On paper... – Colin T Bowers Jan 15 '13 at 11:06
  • @ColinTBowers Thanks. Actually I've tested it on 300x20 matrices like in the OP's example, and the matrix solution was almost twice as fast than the `for` loop. I'm a bit perplexed by your results myself... – Eitan T Jan 15 '13 at 11:55
  • @EitanT Have you tried running my exact code? The results from the profiler make me suspect there is a bug in it - but I can't see what it might be. Anyway, off to bed now. Will check back in tomorrow. Cheers. – Colin T Bowers Jan 15 '13 at 12:08
  • 1
    @ColinTBowers Yeah I ran your code and got results similar to yours. That's why I'm puzzled. – Eitan T Jan 15 '13 at 12:33
  • 1
    @EitanT Found it! Fresh eyes always helps. I was using `M` to control the number of iterations of each method. But your code (which I pasted in without checking carefully enough) was using `M` to measure the length of one of the matrix dimensions. I've fixed it up, re-run the simulations, and adjusted my answer. Cheers! – Colin T Bowers Jan 16 '13 at 00:57
  • 1
    Great! I'm still curious though, why it gets slower than a loop in some cases. – Eitan T Jan 16 '13 at 06:35
  • @EitanT I agree it is odd. Ex ante I thought your method would be fastest. I'm afraid I'm a bit under the gun at the moment and don't have time to track it down any further. But if you look into it and find anything interesting, please let me know :-) – Colin T Bowers Jan 16 '13 at 08:14
1

Hmm.

Given the nature of the contents of your cell array, there may exist an even faster solution: you can convert your cell data to a single matrix and use vector indexing to replace all NaN values in it at once, without the need of cellfun or loops:

a0 = [a{:}];       %// Concatenate matrices along the 2nd dimension
a0(isnan(a0)) = 0; %// Replace NaNs with zeroes

If you want to convert it back to a cell array, that's fine:

[M, N] = size(a{1});
mat2cell(a0, M, N * ones(size(a)))

P.S.
Work with a 3-D matrix instead of a cell array, if possible. Vectorized operations are usually much faster in MATLAB.

Eitan T
  • 32,660
  • 14
  • 72
  • 109