I have two matrices X (122 x 125973)
and Y (1 x 125973)
. I want to do bootstrapping on my dataset in which I want to create B
observations (say B = 3
). As I understand it, the observations of size B
should be randomly drawn with replacement. How can I split in the same way X
and Y
into smaller observation bootstrap samples?
Asked
Active
Viewed 154 times
0
-
The "bootstrapping" part of your question is what you want to do *after* you have drawn random samples with replacement (your original emphasised text). This is not relevant to the current question, hence I removed that from the title and PS. If you think this is highly relevant, please [edit] the question on **why** that part is relevant. – Adriaan Aug 28 '19 at 15:19
-
1Thank you for your edit. "This statistical technique consists in generating samples of size B (called bootstrap samples) from an initial dataset of size N by randomly drawing with replacement B observations." So the term bootstrapping is good for me in the title. There is another thread called "bootstrap a dataset in R" that basically asks for the same thing but in R, so I figured my first title should help others in the future who want to do the same thing in Matlab, which is why I insist on the title. Otherwise, I will remove the PS part. – Aug 28 '19 at 15:38
1 Answers
1
randi()
gives you the ability to drawn pseudorandom integers, including duplicate entries. These can then be used as indices to your observations. Thus:
X = rand(122,125973);
Y = rand(1,125973);
m = 3; % Your desired number of observations; maximum 125973
idx = randi(numel(Y),m,1) % Generate an mx1 vector
BX = X(:,idx); % 122xm matrix
BY = Y(:,idx); % 1xm matrix
You can remove entries from X
and Y
as well, but since you said duplicate entries are explicitly allowed its usage may not be relevant:
X(:,idx) = []; % [] sets to empty array, thus removes the entry
If you want multiple times m
observations, simply loop over it:
N = 100; % Number of observation matrices to be generated
m = 3; % Number of observations per matrix
X = rand(122,125973);
Y = rand(1,125973);
BX = rand(size(X,1),m,N); % 3D matrix for collection
BY = rand(size(X,1),m,N);
for ii = 1:N % Loop over all matrices to be generated
idx = randi(numel(Y),m,1) % Generate an mx1 vector
BX(:,:,ii) = X(:, idx); % 122xmxN matrix
BY(:,:,ii) = Y(:, idx); % 1xmxN matrix
end
BX
and BY
are now 3D matrices containing N
matrices with m
observations each. Calling BX(:,:,n)
selects the n
the matrix with observations. For reading on various indexing ways I suggest to read this post.

Adriaan
- 17,741
- 7
- 42
- 75
-
Thank you for your time and answer, but I am not trying to draw 1 observation with size B. I am trying to draw B observations from the 125973 instances I have with a size m where m<125973. Meaning I want to have B observations BX (122 x m) and BY (1 x m). – Aug 28 '19 at 15:00
-
@U.User how isn't that what I wrote? You suggested 3 observations, this gives you 3. just change the `3` in the `randi()` call to however many observations you want. – Adriaan Aug 28 '19 at 15:15
-
Oh I see where is the misunderstanding, 'My observation apparently is a little different from your observation' xD. The code you wrote gives 3 observations (3 rows) from the dataset, but the resulting matrix is 1 (for both X and Y). What I am interested in is 3 matrices drawn randomly from the original dataset: X1 Y1, X2 Y2 and X3 and Y3. The number of rows could be any m < 125973. Is that clear?. => I think I need a line or two generalize your code to get the 3 matrices where each matrice is generated from your current code :). – Aug 28 '19 at 15:27
-
1@U.User I made it very explicit now, including `m`. You do ***NOT*** want matrices `X1`, `X2` etc, this is called Dynamic Variable Naming and is bad, very very bad, read [this answer of mine](https://stackoverflow.com/a/32467170/5211833). Instead, use a single matrix, as I did, and index into that. I.e. your `X1` corresponds to my `X(:,1)`, `X2` to `X(:,2)` etc. This saves you a lot of hand-copying, doesn't break MATLAB and is way faster to execute. – Adriaan Aug 29 '19 at 07:20
-