My (pseudo) df is in this form.
main var mean
1 age 68.6
1 age 73.3
2 click 40.6
2 click 41
2 click 35
3 pip8kHz 34.2
3 pip8kHz 77.3
I have a df$mean
with 100,000 entries ( in original df) and I need to manipulate it in a way that I split the df on bases on df$main, in each loop get a value from df$mean
as x
and taking rest of the df$main
as y
(x = df$mean[1], y = df$mean[-1];
second loop x = df$mean[1:2] and y = df$mean[-c(1:2)]
and so on) for performing t.test(x,y)
in each loop for each (sub) df after split .
I split the data based on df$main and tried to perform the following code. So far I have written the following code but I am getting an error:
library(data.table)
library(plyr)
new <- function(df) {
upper=0
ds <- split(df, f = df$main) #this col have rows on which I split the df
s <- sapply(ds, function(x) sort(x$mean)) # this is the col with mean values needed for t.test()
n <- mapply(FUN = function(x) {sort(x)}, x = s)
l <- mapply(FUN = function(x) {length(x)}, x = s)
for (j in l) {
for(i in 1:(j-1)) {
for (k in n) {
upper <- c(upper, k[i])
lower <- k[-c(upper:i)]
t <- t.test(upper, lower, var.equal = TRUE, na.rm=TRUE)
print(t[['p.value']])
}
}
}
}
Error in t.test.default(upper, lower, var.equal = TRUE, na.rm = TRUE) :
not enough 'y' observations
In addition: There were 50 or more warnings (use warnings() to see the
first 50)
> warnings()
Warning messages:
1: In -upper:-i : numerical expression has 2 elements: only the first used
Can you please suggest how can I improve the code to get the desired results? Thanks