random sampling - matrix

Question

How can I take a sample of n random points from a matrix populated with 1's and 0's ?

a=rep(0:1,5)
b=rep(0,10)
c=rep(1,10)
dataset=matrix(cbind(a,b,c),nrow=10,ncol=3)

dataset
      [,1] [,2] [,3]
 [1,]    0    0    1
 [2,]    1    0    1
 [3,]    0    0    1
 [4,]    1    0    1
 [5,]    0    0    1
 [6,]    1    0    1
 [7,]    0    0    1
 [8,]    1    0    1
 [9,]    0    0    1
[10,]    1    0    1

I want to be sure that the positions(row,col) from were I take the N samples are random.

I know sample {base} but it doesn't seem to allow me to do that, other methods I know are spatial methods that will force me to add x,y and change it to a spatial object and again back to a normal matrix.

More information

By random I mean also spread inside the "matrix space", e.g. if I make a sampling of 4 points I don't want to have as a result 4 neighboring points, I want them spread in the "matrix space".

Knowing the position(row,col) in the matrix where I took out the random points would also be important.

I don't see any option for "random". maybe this is implicit in the function sample{base}. what i want to be sure is that the points selected are spread, not cluster inside the matrix. if make a sample of 10 points, the 10 points should be random in the matrix space. — Gago-Silva, Feb 02 '12 at 09:50
I agree that sample is not really clear on being random, although it is. If you want spread, than random sampling is not a garantuee. — Paul Hiemstra, Feb 02 '12 at 09:53
I added a more philosophical discussion on sampling to my answer. — Paul Hiemstra, Feb 02 '12 at 10:07

score 12 · Accepted Answer · answered Feb 02 '12 at 09:45

There is a very easy way to sample a matrix that works if you understand that R represents a matrix internally as a vector.

This means you can use sample directly on your matrix. For example, let's assume you want to sample 10 points with replacement:

n <- 10
replace=TRUE

Now just use sample on your matrix:

set.seed(1)
sample(dataset, n, replace=replace)
 [1] 1 0 0 1 0 1 1 0 0 1

To demonstrate how this works, let's decompose it into two steps. Step 1 is to generate an index of sampling positions, and step 2 is to find those positions in your matrix:

set.seed(1)
mysample <- sample(length(dataset), n, replace=replace)
mysample
 [1]  8 12 18 28  7 27 29 20 19  2

dataset[mysample]
 [1] 1 0 0 1 0 1 1 0 0 1

And, hey presto, the results of the two methods are identical.

Paul Hiemstra · Answer 2 · 2012-02-02T10:05:05.783

4

Sample seems the best bet for you. To get 1000 random positions you can do something like:

rows = sample(1:nrow(dataset), 1000, replace = TRUE)
columns = sample(1:ncol(dataset), 1000, replace = TRUE)

I think this gives what you want, but ofcourse I could be mistaken.

Extracting the items from the matrix can be done like:

random_sample = mapply(function(row, col) 
                           return(dataset[row,col]), 
                    row = rows, col = columns)

Sampling strategies

In the comments you speak that your sample needs to have spread. A random sample has no garantuees that there will be no clusters, because of its random nature. There are several more sampling schemes that might be interesting to explore:

Regular sampling, skip the randomness and just sample regularly. Samples the entire matrix space evenly, but there is no randomness.
Stratified random sampling, you divide your matrix space into regular subset, and then sample randomly in those subsets. Presents a mix between random and regular.

To check if your random sampling produces good results, I'd repeat the random sampling a few times and compare the results (as I assume that the sampling will be input for another analysis?).

edited Feb 02 '12 at 10:05

answered Feb 02 '12 at 09:48

Paul Hiemstra

59,984
12
142
149

1

This is correct, but then you still have a challenge to extract the elements from your matrix. If you were to do `dataset[rows, columns]` it will result in a 1000*1000 matrix, not a vector of 1000 elements. I gave up on this approach after two minutes, but I'd be interested to see how you solve it. – Andrie Feb 02 '12 at 09:51
1

+1 Nice use of `mapply` (although I think using `sample` directly on the matrix is much simpler). – Andrie Feb 02 '12 at 10:11
Yes, this will be used for other analysis. I will check multiple samples to be sure how he is sampling the matrix, in the end I will probably try something like the stratified random sampling, seems more appropriate. – Gago-Silva Feb 02 '12 at 10:13
A more theoretical question on sampling strategies could fit well at stats.stackexchange.com – Paul Hiemstra Feb 02 '12 at 10:25
@Andrie, I got stuck in the mindflow of getting random rows and numbers :). – Paul Hiemstra Feb 02 '12 at 10:26

random sampling - matrix

2 Answers2

Sampling strategies

Linked

Related