1

I would like to select a sample from a dataset twice. Actually, I don't want to select it, but to create a new variable sampleNo that indicates to which sample (one or two) a case belongs to.

Lets suppose I have a dataset containing 40 cases:

data <- data.frame(var1=seq(1:40), var2=seq(40,1)) 

The first sample (n=10) I drew like this:

data$sampleNo <- 0
idx <- sample(seq(1,nrow(data)), size=10, replace=F)
data[idx,]$sampleNo <- 1

Now, (and here my problems start) I'd like to draw a second sample (n=10). But this sample should be drawn from the cases that don't belong to the first sample only. Additionally, "var1" should be an even number.

So sampleNo should be 0 for cases that were not drawn at all, 1 for cases that belong to the first sample and 2 for cases belonging to the second sample (= sampleNo equals 0 and var1 is even).

I was trying to solve it like this:

idx2<-data$var1%%2 & data$sampleNo==0
sample(data[idx2,], size=10, replace=F)

But how can I set sampleNo to 2?

D. Studer
  • 1,711
  • 1
  • 16
  • 35

2 Answers2

3

We can use the setdiff function as follows:

sample(setdiff(1:nrow(data), idx), 3, replace = F)

setdiff(x, y) will select the elements of x that are not in y:

setdiff(x = 1:20, y = seq(2,20,2))
 [1]  1  3  5  7  9 11 13 15 17 19

So to include in the above example:

data$sampleNo2 <- 0
idx2 <- sample(setdiff(1:nrow(data), idx), 3, replace = F)
data[idx2,]$sampleNo2 <- 1
Dave Gruenewald
  • 5,329
  • 1
  • 23
  • 35
bouncyball
  • 10,631
  • 19
  • 31
  • 1
    Thank you, very helpful! – D. Studer Sep 25 '17 at 15:58
  • And what if I wanted to add another condition when drawing the second sample? Let's say that the second sample should only be drawn where sampleNo!=0 **AND var1 is an odd number**? – D. Studer Sep 25 '17 at 16:23
  • `seq(2, x, 2)` will generate the even numbers you wish to exclude, you could just wrap it in `setdiff` (I'm sure there are a few other ways, too) `setdiff(setdiff(1:nrow(data), idx), seq(2, nrow(data), 2))` – bouncyball Sep 25 '17 at 16:25
  • But your suggestion only accounts for the odd numbers, not for the first condition (sampleNo==0) too. I tried to use idx<-data$sampleNo==1 & data$var1%%2 when using the second sample-function – D. Studer Sep 25 '17 at 16:37
1

Here is a complete solution to your problem more along the line of your original idea. The code can be shortened but for now I tried to make it as transparent as I could.

# Data
data <- data.frame(var1 = 1:40, var2 = 40:1) 

# Add SampleNo column
data$sampleNo <- 0L

# Randomly select 10 rows as sample 1
pool_idx1 <- 1:nrow(data)
idx1 <- sample(pool_idx1, size = 10)
data[idx1, ]$sampleNo <- 1L

# Draw a second sample from cases where sampleNo != 1 & var1 is even 
pool_idx2 <- pool_idx1[data$var1 %% 2 == 0 & data$sampleNo != 1]
idx2 <- sample(pool_idx2, size = 10)
data[idx2, ]$sampleNo <- 2L
s_baldur
  • 29,441
  • 4
  • 36
  • 69