0

I have two matrices X (122 x 125973) and Y (1 x 125973). I want to do bootstrapping on my dataset in which I want to create B observations (say B = 3). As I understand it, the observations of size B should be randomly drawn with replacement. How can I split in the same way X and Y into smaller observation bootstrap samples?

  • The "bootstrapping" part of your question is what you want to do *after* you have drawn random samples with replacement (your original emphasised text). This is not relevant to the current question, hence I removed that from the title and PS. If you think this is highly relevant, please [edit] the question on **why** that part is relevant. – Adriaan Aug 28 '19 at 15:19
  • 1
    Thank you for your edit. "This statistical technique consists in generating samples of size B (called bootstrap samples) from an initial dataset of size N by randomly drawing with replacement B observations." So the term bootstrapping is good for me in the title. There is another thread called "bootstrap a dataset in R" that basically asks for the same thing but in R, so I figured my first title should help others in the future who want to do the same thing in Matlab, which is why I insist on the title. Otherwise, I will remove the PS part. –  Aug 28 '19 at 15:38

1 Answers1

1

randi() gives you the ability to drawn pseudorandom integers, including duplicate entries. These can then be used as indices to your observations. Thus:

X = rand(122,125973);
Y = rand(1,125973);
m = 3; % Your desired number of observations; maximum 125973
idx = randi(numel(Y),m,1) % Generate an mx1 vector
BX = X(:,idx); % 122xm matrix
BY = Y(:,idx);  % 1xm matrix

You can remove entries from X and Y as well, but since you said duplicate entries are explicitly allowed its usage may not be relevant:

X(:,idx) = [];  % [] sets to empty array, thus removes the entry

If you want multiple times m observations, simply loop over it:

N = 100;  % Number of observation matrices to be generated
m = 3; % Number of observations per matrix
X = rand(122,125973);
Y = rand(1,125973);
BX = rand(size(X,1),m,N);  % 3D matrix for collection
BY = rand(size(X,1),m,N);

for ii = 1:N  % Loop over all matrices to be generated
    idx = randi(numel(Y),m,1) % Generate an mx1 vector
    BX(:,:,ii) = X(:, idx); % 122xmxN matrix
    BY(:,:,ii) = Y(:, idx);  % 1xmxN matrix
end

BX and BY are now 3D matrices containing N matrices with m observations each. Calling BX(:,:,n) selects the nthe matrix with observations. For reading on various indexing ways I suggest to read this post.

Adriaan
  • 17,741
  • 7
  • 42
  • 75
  • Thank you for your time and answer, but I am not trying to draw 1 observation with size B. I am trying to draw B observations from the 125973 instances I have with a size m where m<125973. Meaning I want to have B observations BX (122 x m) and BY (1 x m). –  Aug 28 '19 at 15:00
  • @U.User how isn't that what I wrote? You suggested 3 observations, this gives you 3. just change the `3` in the `randi()` call to however many observations you want. – Adriaan Aug 28 '19 at 15:15
  • Oh I see where is the misunderstanding, 'My observation apparently is a little different from your observation' xD. The code you wrote gives 3 observations (3 rows) from the dataset, but the resulting matrix is 1 (for both X and Y). What I am interested in is 3 matrices drawn randomly from the original dataset: X1 Y1, X2 Y2 and X3 and Y3. The number of rows could be any m < 125973. Is that clear?. => I think I need a line or two generalize your code to get the 3 matrices where each matrice is generated from your current code :). –  Aug 28 '19 at 15:27
  • 1
    @U.User I made it very explicit now, including `m`. You do ***NOT*** want matrices `X1`, `X2` etc, this is called Dynamic Variable Naming and is bad, very very bad, read [this answer of mine](https://stackoverflow.com/a/32467170/5211833). Instead, use a single matrix, as I did, and index into that. I.e. your `X1` corresponds to my `X(:,1)`, `X2` to `X(:,2)` etc. This saves you a lot of hand-copying, doesn't break MATLAB and is way faster to execute. – Adriaan Aug 29 '19 at 07:20
  • Oh, interesting. Thank you, I will follow your advice. –  Aug 30 '19 at 11:22