I have a situation where I have data distributed between two dataframe, and I need to subset the data from one of the dataframes first, and then conduct a t-test between this subset data and the (entire) data from the other dataframe.
I attempted to use %>%
and group_by()
to select the data I want, and then I tried to invoke the t-test as shown below.
library(dplyr)
a <- c("AA","AA","AA","AB","AB","AB")
b <- c(1,2,3,1,2,3)
c <- c(12,34,56,78,90,12)
cols1 <- c("SampID", "Reps", "Vals")
df1 <- data.frame(a,b,c)
colnames(df1) <- cols1
df1
SampID Reps Vals
1 AA 1 12
2 AA 2 34
3 AA 3 56
4 AB 1 78
5 AB 2 90
6 AB 3 12
e <- c(1,2,3,4,5,6,7,8,9)
f <- c(11,22,33,44,55,66,77,88,99)
cols2 <- c("CtrlReps","CtrlVals")
df2 <- data.frame(e,f)
colnames(df2) <- cols2
df2
CtrlReps CtrlVals
1 1 11
2 2 22
3 3 33
4 4 44
5 5 55
6 6 66
7 7 77
8 8 88
9 9 99
df1 %>%
group_by(SampID) %>%
t.test(Vals, df2$CtrlVals, var.equal = FALSE)
This, however, returns an error:
Error in match.arg(alternative) :
'arg' must be NULL or a character vector
I also tried using do
but that returns an error as well:
outputs <- df1 %>%
group_by(SampID) %>%
do(tpvals = t.test(Vals, df2$CtrlVals, data = ., paired = FALSE, var.equal = FALSE)) %>%
summarise(SampID, pvals = tpvals$p.value)
Error in t.test(Vals, df2$CtrlVals, data = ., paired = FALSE, var.equal = FALSE) :
object 'Vals' not found
I am new to R, and I have exhausted my Google-Fu, so I have no idea what is happening. To the best of my knowledge, these two errors are unrelated, I think but resolving one or the other gives me a way out of the situation. I just don't know how. I am also sure that resolving this problem would immediately land me in the next problem (the one this post actually addresses).
Your inputs/guidance/help would be much appreciated!