2

In matlab, I want a 3D matrix with random numbers that are unique along the 3th dimension, as this code does:

M = 2;  N = 10;  L = 5;  K = 100;
mat = zeros([M N L]);
for ii=1:M
    for jj=1:N
        mat(ii,jj,:) = randperm(K,L);
    end
end

However, when the matrix is larger, the computation time rises a lot. Thus, I would like to delete the loop's with any vectorization. I couldn't figure it out any way of doing so, is it possible?

Thank you for the help.


Edit: I have run all the methods for several matrix sizes in this script, and these are the results:

enter image description here

Besides, the distribution of numbers is:

enter image description here

Thus, the implementation of @Luis Mendo is the one that scales better for low L values, which is my case. But, @Rody Oldenhuis optimized proposal is fast interdependently of the L value. Hence, a combined solution could be:

function mat = assignPermMatrix_comb(M,N,L,K)
    R = M*N;
    mat = zeros([L R]);
    if L<K*0.15
        ind = true(1,R); 
        while R
            mat(:,ind) = randi(K, L, R); 
            ind = any(diff(sort(mat))==0);
            R = nnz(ind);
        end
    else
        for ii=1:R
            mat(:,ii) = randperm(K,L);
        end
    end
    mat = reshape(mat.', [M N L]);
end

I am very thankful for all the effort that you put in your answers.

tashuhka
  • 5,028
  • 4
  • 45
  • 64
  • 1
    Have a look at my latest edit. Can you test Luis' approach vs. mine? I'm quite curious now how they compare on newer MATLAB versions :) – Rody Oldenhuis Mar 14 '14 at 15:11
  • 2
    You can see now the comparison in the updated question. It seems Luis' code is faster. Anyway, thank you so much for your time and interest. – tashuhka Mar 15 '14 at 17:53
  • 1
    Thanks for doing the comparison. Note that the running time results are sensitive to the numbers used. For example, my approach will suffer if K is not sufficiently larger than L – Luis Mendo Mar 15 '14 at 17:55
  • 1
    I must say, I'm completely mystified why my "optimized" solution is consistently *worse* than the original, even worse than `arrayfun`...Did you replace the `sort` with the two-argument `randperm`? What version of MATLAB are you using? And what's your hardware? – Rody Oldenhuis Mar 17 '14 at 09:29
  • 1
    Also, looking at the distributions, it would appear I made a mistake with the Fisher/Yates implementation...did you happen to find that mistake? – Rody Oldenhuis Mar 17 '14 at 09:30
  • @RodyOldenhuis, thank you so much for your comments. I tried again, but indeed there is not too much difference between the original and optimized implementation!? (FYI: I changed the `sort()` for `randperm`, and I am using 8.2.0.701 (R2013b) on a 64-Bit i5 CPU M540 2.5GHz (2 cores) machine with 6GB RAM) – tashuhka Mar 19 '14 at 10:50
  • I have been also trying to debug you Fisher/Yates implementation, from where did you get the concept? (maybe from here: http://bost.ocks.org/mike/shuffle/). BTW, it seems that `randperm` is already using Fisher/Yates: http://www.mathworks.com/matlabcentral/answers/20275-shuffle-matrix-elements – tashuhka Mar 19 '14 at 10:52
  • 1
    @tashuhka: "not too much difference between the original and optimized implementation" sounds better than what the graph is showing...If that's true, then it seems that the MathWorks have updated their JIT so it optimizes double-loops and 2D indexing, something that was not very fast until at least R2012b. But, as you have R2013b, it seems you can finally ignore my "optimization" :) – Rody Oldenhuis Mar 19 '14 at 10:59

3 Answers3

2

A rejection approach may be faster, depending on the values of L and K.

The idea is to generate all entries with randi without regard to repetition, detect third-dim-lines which have repetitions, and generate those again, until no repetitions exist. It's easier to work with the first two dimensions collapsed into one, and reshape at the end.

Of course, running time is random with this approach.

ind = true(1,M*N); %// lines that need generaring. Initially all of them
R = M*N; %// number of third-dim-lines that need to be generated
while R
    output(:,ind) = randi(K, L, R); %// (re)generate random values where needed 
    ind = any(diff(sort(output))==0); %// detect repetitions, for next iteration
    R = nnz(ind);
end
output = output.';
output = reshape(output, [M N L]);
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
2

Partially unrolling your loop will certainly help:

mat = zeros(L,M*N);
for ii=1:M*N        
    mat(:,ii) = randperm(K,L);
end
mat = reshape(mat.', [M N L]);

But I think the main issue is that you're using randperm with large K and small L. I'm not sure how randperm is implemented on newer versions of MATLAB (which you seem to have), but if it's anything like the version I have, it physically creates a randomized sorting of the integers 1 through K, and then extracts the first L from that array. So, if K is relatively large and L relatively small, this means you're doing a lot of unnecessary work at each loop iteration. Luis' solution is then better.

To test that theory, consider the following simple test:

M = 20;  N = 100;  
L = 5;   K = 1000;

%// Original
tic
mat = zeros([M N L]);
for ii=1:M
    for jj=1:N   
        [~,P] = sort(rand(K,1)); %// Note: I don't have the 
        mat(ii,jj,:) = P(1:L);   %// newer randperm
    end
end
toc

%// Optimized version
tic
mat = zeros(L, M*N);
for ii=1:M*N
    [~,P] = sort(rand(K,1));
    mat(:,ii) = P(1:L);
end
mat = reshape(mat.', [M N L]);
toc

%// Avoid doing so much useless work
tic
ints = 1:K;
mat = zeros(L, M*N);
for ii=1:M*N
    mat(:,ii) = inds(randi(K,L,1));
end
mat = reshape(mat.', [M N L]);
toc

Results:

Elapsed time is 0.233492 seconds. %// original
Elapsed time is 0.231393 seconds. %// optimized
Elapsed time is 0.007062 seconds. %// oh...wow.

Note that the last test is not yet a valid solution, because I'm not yet checking for uniqueness. Nevertheless, it goes to show that this is likely still how the newer randperm is doing its job.

So, the final version:

ints = 1:K;
mat = zeros(L, M*N);
for ii=1:M*N
    inds = randi(K,L,1);
    while any(diff(sort(inds))==0)
        inds = randi(K,L,1); end
    mat(:,ii) = inds();
end
mat = reshape(mat.', [M N L]);

Test results for M = 100; N = 200; L = 5; K = 100;:

Elapsed time is 0.315532 seconds.
Elapsed time is 0.297795 seconds.
Elapsed time is 0.189210 seconds.

Test results for M = 100; N = 200; L = 5; K = 100;:

Elapsed time is 10.818245 seconds.
Elapsed time is 10.733220 seconds.
Elapsed time is 0.788050 seconds.

However, test results for M = 10; N = 10; L = 40; K = 50;:

Elapsed time is 0.001326 seconds.
Elapsed time is 0.001108 seconds.
Elapsed time is 238.300146 seconds.  %// wait, WHAT?!

So, it would seem we have to come up with something smarter...

So, after a bit of soulsearching, I came up with the following:

%// This uses a form of the Fisher/Yates shuffle algorithm
mat  = zeros(L, M*N);
ints = 1:K;
inds = randi(K,M*N,L);
L1   = 1:L;

for ii = 1:M*N

    tmp = ints(L1);
    ints(L1) = ints(inds(ii,:));
    ints(inds(ii,:)) = tmp;

    mat(:,ii) = ints(L1);

end

mat = reshape(mat.', [M N L]);

Results for M = 250; N = 250; L = 150; K = 250;

Elapsed time is 2.332690 seconds.
Elapsed time is 2.140191 seconds.
Elapsed time is 1.512606 seconds.

Results for M = 250; N = 250; L = 15; K = 100;

Elapsed time is 1.021733 seconds.
Elapsed time is 0.956033 seconds.
Elapsed time is 0.445112 seconds.

Still quite disappointing really...But oh well, certainly better than it was.

Rody Oldenhuis
  • 37,726
  • 7
  • 50
  • 96
0

This should execute quicker:

s = repmat(L, [M*N 1]);
P = arrayfun(@(x)(randperm(K, x)), s, 'UniformOutput', false);
Q = cell2mat(P);
mat = reshape(Q, [M N L]);

NOTE: The randperm I have only accepts one parameter, so I couldn't try your code, this approach works for me with the anonymous function @(x)(randperm(x)) in arrayfun.

Iban Cereijo
  • 1,675
  • 1
  • 15
  • 28
  • Forget about this, ´arrayfun´ turns out to be slower than what I expected. – Iban Cereijo Mar 14 '14 at 13:26
  • 2
    Indeed, `arrayfun` is usually slower than a plain loop. By the way: `cell2mat` is just a loop, so technically, you still have 2 loops, but 1 is implicit in `arrayfun` and 1 is hidden in `cell2mat` behind some overhead...so this is indeed slower than two loops. – Rody Oldenhuis Mar 14 '14 at 13:31
  • 1
    See [this question](http://stackoverflow.com/questions/12522888/arrayfun-can-be-significantly-slower-than-an-explicit-loop-in-matlab-why). – Rody Oldenhuis Mar 14 '14 at 13:32
  • Thanks @RodyOldenhuis, I thought there was more native code involved there. – Iban Cereijo Mar 14 '14 at 13:37
  • 1
    well, there is, problem is the anonymous function and MATLAB's inability to inline it. It's a pity really... – Rody Oldenhuis Mar 14 '14 at 13:41
  • True, it is slightly better using just `@randperm` in `arrayfun`. – Iban Cereijo Mar 14 '14 at 13:52