0

I have a number of columns that are numbered sequentially. e.g. q1, q2, q3, etc. I also have an indicator variable (ind) for control or treatment status for each observation. I want to conduct a series of t-tests for a difference-in-means between the treatment and control groups for each question. Rather than typing question-by-question, I'd like a loop that outputs the p-values from all the tests into a matrix.

I think that the problem is with how I'm using paste(). I think that I am failing to create an object that actually calls up the data. I think R is trying to run a t.test on the text, without connecting the text to the data that it is meant to reference. strong text

data <- data.frame(matrix(NA,50,8))

colnames(data) <- c("q1","q2","q3","q4","q5","q6","q7","ind")

data[,"ind"]<- c(rep(0,25),rep(1,25))

set.seed(42)
data[,"q1"] <- rnorm(50)
data[,"q2"] <- rnorm(50)
data[,"q3"] <- rnorm(50)
data[,"q4"] <- rnorm(50)
data[,"q5"] <- rnorm(50)
data[,"q6"] <- rnorm(50)
data[,"q7"] <- rnorm(50)

results <- data.frame(matrix(NA,7,2))

## Attempt One
for(i in 1:7){
results[i,1] <- i
a <- paste0("data$q",i,"[data$ind==1]")
b <- paste0("data$q",i,"[data$ind==0]")
results[i,2] <- t.test(a,b)[3]
}

####
# Error in t.test.default(a, b) : not enough 'x' observations
# In addition: Warning messages:
# 1: In mean.default(x) : argument is not numeric or logical: returning NA
# 2: In var(x) : NAs introduced by coercion


###Attempt Two
for(i in 1:7){
results[i,1] <- i
a <- get(paste0("data$q",i,"[data$ind==1]"))
b <- get(paste0("data$q",i,"[data$ind==0]"))
results[i,2] <- t.test(a,b)[3]
}

####
# Error in get(paste0("data$q", i, "[data$ind==1]")) : 
#  object 'data$q1[data$ind==1]' not found

I have found many discussions about how to create variable names with paste in R, but I'm looking for how to call a variable name with paste in R. 1. how to assign to the names() attribute of the value of a variable in R 2. How to name variables on the fly? 3. Accessing variables with a for loop, as attempted in "Attempt Two" above: Change variable name in for loop using R

Community
  • 1
  • 1
Dr. Beeblebrox
  • 838
  • 2
  • 13
  • 30

1 Answers1

1

If you are trying to manipulate variables as character strings, that is a sure sign that you are barking up the wrong tree. Anytime you are tempted to use get or assign, think again, you're probably doing it wrong.

library(plyr)
library(reshape2_
data_m <- melt(data,id.var = "ind")
ddply(data_m,.(variable),
      function(x) t.test(x$value[x$ind == 1],x$value[x$ind == 0])[[3]])

Or just:

lapply(data[,1:7],function(x) t.test(x[data$ind == 1],x[data$ind == 0])[[3]])
joran
  • 169,992
  • 32
  • 429
  • 468
  • First time seeing `melt`. Very interesting. I'm wondering, the variables I'm interested in looping through are only a few from a much larger data frame. How will I use the approach above when I'm interested in looping through ~20 numerically sequenced variables in a dataset with ~300 columns? – Dr. Beeblebrox Jun 18 '13 at 20:06
  • @DTRM Simple, only melt a subset of your data frame. Same idea applies with `lapply` (or use `sapply` to get a vector rather than a list). – joran Jun 18 '13 at 20:10
  • OK, thank you! I tried the `lapply` option, but I get this error: `not enough 'y' observations` Any ideas? – Dr. Beeblebrox Jun 18 '13 at 20:19
  • @DTRM I'm not a mind reader. My code works flawlessly on the example you provided. – joran Jun 18 '13 at 20:20