5

How can I take a sample of n random points from a matrix populated with 1's and 0's ?

a=rep(0:1,5)
b=rep(0,10)
c=rep(1,10)
dataset=matrix(cbind(a,b,c),nrow=10,ncol=3)

dataset
      [,1] [,2] [,3]
 [1,]    0    0    1
 [2,]    1    0    1
 [3,]    0    0    1
 [4,]    1    0    1
 [5,]    0    0    1
 [6,]    1    0    1
 [7,]    0    0    1
 [8,]    1    0    1
 [9,]    0    0    1
[10,]    1    0    1

I want to be sure that the positions(row,col) from were I take the N samples are random.

I know sample {base} but it doesn't seem to allow me to do that, other methods I know are spatial methods that will force me to add x,y and change it to a spatial object and again back to a normal matrix.

More information

By random I mean also spread inside the "matrix space", e.g. if I make a sampling of 4 points I don't want to have as a result 4 neighboring points, I want them spread in the "matrix space".

Knowing the position(row,col) in the matrix where I took out the random points would also be important.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
Gago-Silva
  • 1,873
  • 4
  • 22
  • 46
  • Why doesn't sample seem to do what you want? – Paul Hiemstra Feb 02 '12 at 09:45
  • I don't see any option for "random". maybe this is implicit in the function sample{base}. what i want to be sure is that the points selected are spread, not cluster inside the matrix. if make a sample of 10 points, the 10 points should be random in the matrix space. – Gago-Silva Feb 02 '12 at 09:50
  • I agree that sample is not really clear on being random, although it is. If you want spread, than random sampling is not a garantuee. – Paul Hiemstra Feb 02 '12 at 09:53
  • I added a more philosophical discussion on sampling to my answer. – Paul Hiemstra Feb 02 '12 at 10:07

2 Answers2

12

There is a very easy way to sample a matrix that works if you understand that R represents a matrix internally as a vector.

This means you can use sample directly on your matrix. For example, let's assume you want to sample 10 points with replacement:

n <- 10
replace=TRUE

Now just use sample on your matrix:

set.seed(1)
sample(dataset, n, replace=replace)
 [1] 1 0 0 1 0 1 1 0 0 1

To demonstrate how this works, let's decompose it into two steps. Step 1 is to generate an index of sampling positions, and step 2 is to find those positions in your matrix:

set.seed(1)
mysample <- sample(length(dataset), n, replace=replace)
mysample
 [1]  8 12 18 28  7 27 29 20 19  2

dataset[mysample]
 [1] 1 0 0 1 0 1 1 0 0 1

And, hey presto, the results of the two methods are identical.

Andrie
  • 176,377
  • 47
  • 447
  • 496
4

Sample seems the best bet for you. To get 1000 random positions you can do something like:

rows = sample(1:nrow(dataset), 1000, replace = TRUE)
columns = sample(1:ncol(dataset), 1000, replace = TRUE)

I think this gives what you want, but ofcourse I could be mistaken.

Extracting the items from the matrix can be done like:

random_sample = mapply(function(row, col) 
                           return(dataset[row,col]), 
                    row = rows, col = columns)

Sampling strategies

In the comments you speak that your sample needs to have spread. A random sample has no garantuees that there will be no clusters, because of its random nature. There are several more sampling schemes that might be interesting to explore:

  • Regular sampling, skip the randomness and just sample regularly. Samples the entire matrix space evenly, but there is no randomness.
  • Stratified random sampling, you divide your matrix space into regular subset, and then sample randomly in those subsets. Presents a mix between random and regular.

To check if your random sampling produces good results, I'd repeat the random sampling a few times and compare the results (as I assume that the sampling will be input for another analysis?).

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • 1
    This is correct, but then you still have a challenge to extract the elements from your matrix. If you were to do `dataset[rows, columns]` it will result in a 1000*1000 matrix, not a vector of 1000 elements. I gave up on this approach after two minutes, but I'd be interested to see how you solve it. – Andrie Feb 02 '12 at 09:51
  • 1
    +1 Nice use of `mapply` (although I think using `sample` directly on the matrix is much simpler). – Andrie Feb 02 '12 at 10:11
  • Yes, this will be used for other analysis. I will check multiple samples to be sure how he is sampling the matrix, in the end I will probably try something like the stratified random sampling, seems more appropriate. – Gago-Silva Feb 02 '12 at 10:13
  • A more theoretical question on sampling strategies could fit well at stats.stackexchange.com – Paul Hiemstra Feb 02 '12 at 10:25
  • @Andrie, I got stuck in the mindflow of getting random rows and numbers :). – Paul Hiemstra Feb 02 '12 at 10:26