T-Test For Genes using Apply Function in Dataframe

Question

I’m trying to run a t.test on two data frames.

The dataframes (which I carved out from a data.frame) has the data I need to rows 1:143. I’ve already created sub-variables as I needed to calculate rowMeans.

> c.mRNA<-rowMeans(c007[1:143,(4:9)])
> h.mRNA<-rowMeans(c007[1:143,(10:15)])

I’m simply trying to run a t.test for each row, and then plot the p-values as histograms. This is what I thought would work…

Pvals<-apply(mRNA143.data,1,function(x) {t.test(x[c.mRNA],x[h.mRNA])$p.value})

But I keep getting an error?

Error in t.test.default(x[c.mRNA], x[h.mRNA]) : 
  not enough 'x' observations

I’ve got something off in my syntax and cannot figure it out for the life of me!

EDIT: I've created a data.frame so it's now just two columns, I need a p-value for each row. Below is a sample of my data...

      c.mRNA    h.mRNA
1    8.224342  8.520142
2    9.096665 11.762597
3   10.698863 10.815275
4   10.666233 10.972130
5   12.043525 12.140297

I tried this...

 pvals=apply(mRNA143.data,1,function(x) {t.test(mRNA143.data[,1],mRNA143.data[, 2])$p.value})

But I can tell from my plot that I'm off (the plots are in a straight line).

Please make your question [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by sharing a minimal dataset so that the community would help you better. — OzanStats, Aug 19 '18 at 01:14
What I'm getting with my code below is one t-test repeated for all 143 rows instead of a new t-test for the data in each row. `> apply(mRNA143.data, 1, function(x) t.test(mRNA143.data[1:143,1:1], mRNA143.data[1:143,2:2])$p.value)` — Oars, Aug 19 '18 at 17:02
I've been able to compute a t.test for each row. `t.test.mRNA143.data<-apply(mRNA143.data, 1, t.test)`, How do I extract the p-values for plotting? — Oars, Aug 19 '18 at 17:35

score 0 · Answer 1 · answered Aug 19 '18 at 06:16

A reproducible example would go a long way. In preparing it, you might have realized that you are trying to subset columns based on mean, which doesn't make sense, really.

What you want to do is go through rows of your data, subset columns belonging to a certain group, repeat for the second group and pass that to t.test function.

This is how I would do it.

group1 <- matrix(rnorm(50, mean = 0, sd = 2), ncol = 5)
group2 <- matrix(rnorm(50, mean = 5, sd = 2), ncol = 5)

xy <- cbind(group1, group2)

# this is just a visualization of the test you're performing
plot(0, 0, xlim = c(-5, 11), ylim = c(0, 0.25), type = "n")
curve(dnorm(x, mean = 5, sd = 2), add = TRUE)
curve(dnorm(x, mean = 0, sd = 2), add = TRUE)

out <- apply(xy, MARGIN = 1, FUN = function(x) {
  # x is a vector, e.g. xy[i, ] or xy[1, ]
  t.test(x = x[1:5], y = x[6:10])$p.value
})
out

Roman - I see how you're creating a new variable called xy. When writing the snippet, do I retain the function(x) and replace the x's with group1, and y with group2 in your example for the t.test? Many thanks! — Oars, Aug 19 '18 at 16:21
@Oars `xy` is your object. `FUN` argument is passed an anonymous function. Arguments in this anonymous function (in this case `x`) should match arguments in its body (e.g. `x[1:5]`). This is the benefit of a function. You change the input, but within the body of the function, everything is predictable. — Roman Luštrik, Aug 19 '18 at 17:56

T-Test For Genes using Apply Function in Dataframe

1 Answers1

Linked