Latin hypercube sampling of 5 indepentent distributions

Question

I have 5 variables A,V,h,l and b which all stem from different distributions. I'd like to make a 1,000 equally distributed samples from each distribution by the method of latin hypercube sampling. Is this a realistic request, ie is it really better than simple random sampling? Do you have any references of how I can do this in matlab? This page suggests I would need to transform the sample somehow...

It's definitely better than random sampling, but it seems to be a very tough task to get the 5-dimensional distribution. I'd be very interested in an answer as well! — Robert Seifert, Sep 25 '13 at 16:41
By the way, if you want that somebody helps you, you need to explain how a LHC matrix looks like, I doubt a lot of people would do research just to answer your question... Have you thought about other samling designs, like central composite? they are much easier to implement. — Robert Seifert, Sep 25 '13 at 21:16
have a look at my improved answer. It will help other to understand your problem (and my curiosity). Could you please improve your question, so that a bounty makes sense? I already did it by myself, but the reviewers hadn't excepted my edits. — Robert Seifert, Sep 28 '13 at 12:15

score 5 · Accepted Answer · edited Sep 08 '20 at 06:09

UPDATE #2: solution using built-in function of Statistics Toolbox

The basic question is whether you want your samples on a regular grid or not. If not, you could use the built-in function lhsdesign:

p = 1000   % Number of points
N = 5   % Number of dimensions
lb = [1 1 1 1 1]; % lower bounds for A,V,h,l and b
ub = [10 10 10 10 10]; % upper bounds for A,V,h,l and b
X = lhsdesign(p,N,'criterion','correlation');
D = bsxfun(@plus,lb,bsxfun(@times,X,(ub-lb)));

'criterion','correlation' would give you the desired "equal distribution".

D then contains the irregular coordinate-distribution for your parameters.

First I thought you we're looking for samples on a regular grid, which really seems to be a tough task. I tried to modify the approach above D = round(bsxfun...), but it won't give you satisfying results. So for this case I still provide my initial idea here:

The following solution is far from fast and elegant, but it's at least a solution.

% For at least 1000 samples M=6 divisions are necessary
M = 6;
N = 5;
% the perfect LHC distribution would have 1296 samples for M=6 divisions
% and 5 dimensions
counter_max = M^(N-1); %=1296

% pre-allocation
D = zeros(6,6,6,6,6);
counter = 0;

while counter < 1000
    c = randi(6,1,5);
    if ( sum( D( c(1) , c(2) , c(3) , c(4) ,  :    )) < 1 && ...
         sum( D( c(1) , c(2) , c(3) ,   :  , c(5)  )) < 1 && ...   
         sum( D( c(1) , c(2) ,   :  , c(4) , c(5)  )) < 1 && ...   
         sum( D( c(1) ,   :  , c(3) , c(4) , c(5)  )) < 1 && ...   
         sum( D(   :  , c(2) , c(3) , c(4) , c(5)  )) < 1  )
     
     D(c(1),c(2),c(3),c(4),c(5)) = 1;
              
     X(counter,:) = c;
     counter = counter+1;

    end
end

X is finally containing the coordinates of all your samples.

As you see I used a while-loop with an underlying if-condition. You wish 1000 samples, that's a realistic number and can be done in a reasonable time. I actually would recommend you to use a number of samples as close as possible to the maximum of 1296. That could take you a pretty while. But as you create the resulting matrix just once and use it again and again, don't hesitate to run it 24hrs. You could also implement an interuption code as described here: In MatLab, is it possible to terminate a script, but save all its internal variables to workspace? and see how many samples you got until then. (I got 900 samples in 20min when I was testing)

UPDATE: Example to show limitations of method:

The following example shall illustrate, what the asker could be willing to do and what the result is actually supposed to look like. Because I'm also very interested in a good solution, mine is limited and cannot provide the "100% result".

Imagine a cube (N=3) with M=10 divisions.

M = 10;
N = 3;
counter_max = M^(N-1);   %=100 maximum number of placeable samples

% pre-allocations
D = zeros(10,10,10);
counter = 0;

while counter < counter_max
    c = randi(10,1,3);
    % if condition checks if there is already a sample in the same row,
    % coloumn or z-coordinate, 
    if ( sum( D( c(1) , c(2) ,   :  )) < 1 && ...   
         sum( D( c(1) ,   :  , c(3) )) < 1 && ...   
         sum( D(   :  , c(2) , c(3) )) < 1  )
       
       %if not a new sample is generated
       D(c(1),c(2),c(3)) = 1;
       counter = counter+1;
       X(counter,:) = c;

    end
end

After about 10000 iterations one gets the following distribution with 85 of 100 possible placed samples: enter image description here where the color indicates the normalized distance to the closest neighbor. For most of the points it's fine (1), but as there are 15 missing samples, some points are more distant from others.

The problem is: I doubt that it's possible to get all 100 samples in a reasonable time. When one plots the generated samples over the number of iterations you get: enter image description here

...so the desired result seems hardly obtainable.

Please see this answer more as an encouragement than a solution.

That's fantastic! Thank you very much indeed (when bounty is available I'll put some on)! Two minor questions: **1)** X(counter)=c, gives me an error: **In an assignment A(I) = B, the number of elements in B and I must be the same.**. X{counter}=c works but is this in the format you intend? **2)** Also, I'm looking to transform the sample points we've made to extract values from different distributions: e.g. a Gamma distribution and a Normal distribution for these different variables. How can the matrix X be transformed to achieve this? — HCAI, Sep 26 '13 at 10:32
@HCAI to 1) it was a typo, I corrected it. 2) I cannot answer your second question, but I guess you should work with Matrix `D` instead of `X`. I also thought about a bounty, as I'm really interested in a better solution myself, though I'm not working on a similiar problem at the moment. But you need to improve your question, put some basic examples from wikipedia there and explain how the design matrix 'D' generally looks like. — Robert Seifert, Sep 26 '13 at 10:40

DW Kim · Answer 2 · 2019-02-09T10:50:18.617

By combining 1-D latin hypercube samples (LHS), you can make a full set of LHS for regular grid in higher order dimension. For example, imagine 3X3 LHS (ie 2-D and 3 divisions). First, you just make 1-D LHS for regular grid. (1,0,0), (0, 1, 0), (0, 0, 1) for 1-D. And then, combine the 1-D LHS to make 2-D LHS.

1, 0, 0

0, 1, 0

0, 0, 1

or

0, 1, 0

1, 0, 0

0, 0, 1

... etc.

LHS for 3-D can also be created using the same method(by combining 2-D LHS).

There are 12 possible LHS for 3X3. Generally, the number of possible LHS is N x ((M-1)!)^(M-1). (N=divisions, M=dimensions)

The following code shows LHS for 3-D and 10 divisions.

This code generates only one LHS.

result is random.

it takes 0.001288 sec for 100% result

clear;
clc;

M = 3; % dimension
N = 10; % division

Sel2 = ':,';
stop = 0;

P_matrix = repmat([num2str(N),','],1,M);
P_matrix = P_matrix(1:end-1);
eval(['P = zeros(', P_matrix, ');']);

P(1,1) = 1;
tic

while stop == 0
for i = 1 : M-1
    for j = 2:N
        if i == 1
            P(end    , j, 1) = P(1    , j-1, 1);
            P(1:end-1, j, 1) = P(2:end, j-1, 1);
        else
            Sel_2 = repmat(Sel2,1,i-1);
            Sel_2 = Sel_2(1:end-1);
            eval(['P(', Sel_2, ',end    , j, 1)     = P(', Sel_2 , ', 1    , j-1, 1);']);
            eval(['P(', Sel_2, ',1:end-1    , j, 1) = P(', Sel_2 , ', 2:end, j-1, 1);']);
        end
    end
    if i == 1
        P(:,:,1) = P(randperm(N),:,1);
    elseif i <M-1
        Sel_2 = repmat(Sel2,1,i);
        Sel_2 = Sel_2(1:end-1);
        eval(['P(',Sel_2,',:,1) = P(',Sel_2,',randperm(N),1);']);
    else
        Sel_2 = repmat(Sel2,1,i);
        Sel_2 = Sel_2(1:end-1);
        eval(['P(',Sel_2,',:) = P(',Sel_2,',randperm(N));']);
    end
end

% you can add stop condition
stop = 1;
end
toc

[x, y, z] = ind2sub(size(P),find(P == 1));
scatter3(x,y,z);
xlabel('X');
ylabel('Y');
zlabel('Z');

Result

score 0 · Answer 3 · answered Aug 29 '20 at 21:13

This code gives the same result as the accepted one in this discussion. Check it out!

n=1000; p=5; 
distr=lhsdesign(n,p); %creates LHS of n samples on each of your p dimensions 
% Then, you can choose any inverted distribution. in this case it is the Discrete Uniform Distribution
Parameters=[unidinv(distr(:,1),UB1) unidinv(distr(:,2),UB2) ...  
            unidinv(distr(:,3),UB3) unidinv(distr(:,4),UB4) ...
            unidinv(distr(:,5),UB5) ];
%At the end, you'll do a simple work of indexing.

Latin hypercube sampling of 5 indepentent distributions

3 Answers3