big data df manipulation to run t.test() in r

Question

My (pseudo) df is in this form.

main var       mean
1   age        68.6
1   age        73.3
2   click      40.6
2   click      41
2   click      35
3   pip8kHz    34.2
3   pip8kHz    77.3

I have a df$mean with 100,000 entries ( in original df) and I need to manipulate it in a way that I split the df on bases on df$main, in each loop get a value from df$mean as x and taking rest of the df$main as y (x = df$mean[1], y = df$mean[-1]; second loop x = df$mean[1:2] and y = df$mean[-c(1:2)] and so on) for performing t.test(x,y) in each loop for each (sub) df after split .

I split the data based on df$main and tried to perform the following code. So far I have written the following code but I am getting an error:

library(data.table)
library(plyr)
new <- function(df) {
upper=0
ds <- split(df, f = df$main) #this col have rows on which I split the df
s <- sapply(ds, function(x) sort(x$mean)) # this is the col with mean values needed for t.test()
n <- mapply(FUN = function(x) {sort(x)}, x = s)
l <- mapply(FUN = function(x) {length(x)}, x = s)
for (j in l) {
  for(i in 1:(j-1)) {
    for (k in n) {
      upper <- c(upper, k[i])
      lower <- k[-c(upper:i)]
      t <- t.test(upper, lower, var.equal = TRUE, na.rm=TRUE)
      print(t[['p.value']])
     }
    }  
  }
}

Error in t.test.default(upper, lower, var.equal = TRUE, na.rm = TRUE) : 
not enough 'y' observations
In addition: There were 50 or more warnings (use warnings() to see the 
first 50)

 > warnings()
 Warning messages:
 1: In -upper:-i : numerical expression has 2 elements: only the first used

Can you please suggest how can I improve the code to get the desired results? Thanks

Can you please post an actual sample of your code (i.e., with `head` and `dput`) that shows enough to attack the problem. Can you also edit your question to specify **what** you want, not **how** you want it. If it's just a matter of iterating `t.test` over values, there are better ways than three nested loops — Conor Neilson, Mar 20 '18 at 19:23
Hi, if you read my first three lines. its explained. I have a df$col with 100,000 entries and I need to manipulate it in the way It gives in each loop to take a value from df$col as 'x' and taking rest of the df$col as 'y' (x = s[1], y = s[-1]; x = s[1:2] and y = s[-c(1:2)}] ) for performing t.test(x,y) in each loop. And this is my actual code which I have written so far but this gives an error. Does my question is understandable now? Thanks — user3698773, Mar 20 '18 at 19:33
Your first error `not enough 'y' observations` says that `t.test` needs at least two values for both `x` and `y` to perform the calculation — bouncyball, Mar 20 '18 at 19:52
@user3698773 Right, but it is much easier for people to answer your question if you include a reproducible form of your data with `dput`. That means we can run it in our own sessions. Please read the [guide to reproducible examples](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — Conor Neilson, Mar 20 '18 at 22:42
Thanks for the info, I have edited and updated the question with my (pseudo) data. Please suggest !! — user3698773, Mar 20 '18 at 23:11

big data df manipulation to run t.test() in r

0 Answers0