I have read a lot of posts related to data wrangling and “repeated” t-test but I can’t figure out the way to achieve it in my case.
You can get my example dataset for StackOverflow here: https://www.dropbox.com/s/0b618fs1jjnuzbg/dataset.example.stckovflw.txt?dl=0
I have a big dataframe of gen expression like:
> b<-read.delim("dataset.example.stckovflw.txt")
> head(b)
animal gen condition tissue LogFC
1 animalcontrol1 kjhss1 control brain 7.129283
2 animalcontrol1 sdth2 control brain 7.179909
3 animalcontrol1 sgdhstjh20 control brain 9.353147
4 animalcontrol1 jdygfjgdkydg21 control brain 6.459432
5 animalcontrol1 shfjdfyjydg22 control brain 9.372865
6 animalcontrol1 jdyjkdg23 control brain 9.541097
> str(b)
'data.frame': 21507 obs. of 5 variables:
$ animal : Factor w/ 25 levels "animalcontrol1",..: 1 1 1 1 1 1 1 1 1 1 ...
$ gen : Factor w/ 1131 levels "dghwg1041","dghwg1086",..: 480 761 787 360 863 385 133 888 563 738 ...
$ condition: Factor w/ 5 levels "control","treatmentA",..: 1 1 1 1 1 1 1 1 1 1 ...
$ tissue : Factor w/ 2 levels "brain","heart": 1 1 1 1 1 1 1 1 1 1 ...
$ LogFC : num 7.13 7.18 9.35 6.46 9.37 ...
Each group has 5 animals, and each animals has many gens quantified. (However, each animal may possibly have a different set of quantified gens, but also many of the gens will be in common between animals and groups).
I would like to perform t-test for each gen between my treated group (A, B, C or D) and the controls. The data should be presented as a table containing the p- value for each gen in each group.
Because I have so many gens (thousand), I cannot subset each gen.
Do you know how could I automate the procedure ?
I was thinking about a loop but I am absolutely not sure it could achieve what I want and how to proceed.
Also, I was looking more at these posts using the apply
function : Apply t-test on many columns in a dataframe split by factor and Looping through t.tests for data frame subsets in r
@andrew_reece : Thank you very much for this. It is almost-exactly what I was looking for. However, I can’t find the way to do it with t-test. ANOVA is interesting information, but then I will need to know which of the treated groups is/are significantly different from my controls. Also I would need to know which treated group is significantly different from each others, “two by two”.
I have been trying to use your code by changing the “aov(..)” in “t.test(…)”. For that, first I realize a subset(b, condition == "control" | condition == "treatmentA" ) in order to compare only two groups. However, when exporting the result table in csv file, the table is unanderstandable (no gen name, no p-values, etc, only numbers). I will keep searching a way to do it properly but until now I’m stuck.
@42:
Thank you very much for these tips. This is just a dataset example, let’s assume we do have to use individual t-tests.
This is very useful start for exploring my data. For example, I have been trying to reprsent my data with Venndiagrams. I can write my code but it is kind of out of the initial topic. Also, I don't know how to summarize in a less fastidious way the shared "gene" detected in each combination of conditions so i have simplified with only 3 conditions.
# Visualisation of shared genes by VennDiagrams :
# let's simplify and consider only 3 conditions :
b<-read.delim("dataset.example.stckovflw.txt")
b<- subset(b, condition == "control" | condition == "treatmentA" | condition == "treatmentB")
b1<-table(b$gen, b$condition)
b1
b2<-subset(data.frame(b1, "control" > 2
|"treatmentA" > 2
|"treatmentB" > 2 ))
b3<-subset(b2, Freq>2) # select only genes that have been quantified in at least 2 animals per group
b3
b4 = within(b3, {
Freq = ifelse(Freq > 1, 1, 0)
}) # for those observations, we consider the gene has been detected so we change the value 0 regardless the freq of occurence (>2)
b4
b5<-table(b4$Var1, b4$Var2)
write.csv(b5, file = "b5.csv")
# make an intermediate file .txt (just add manually the name of the cfirst column title)
# so now we have info
bb5<-read.delim("bb5.txt")
nrow(subset(bb5, control == 1))
nrow(subset(bb5, treatmentA == 1))
nrow(subset(bb5, treatmentB == 1))
nrow(subset(bb5, control == 1 & treatmentA == 1))
nrow(subset(bb5, control == 1 & treatmentB == 1))
nrow(subset(bb5, treatmentA == 1 & treatmentB == 1))
nrow(subset(bb5, control == 1 & treatmentA == 1 & treatmentB == 1))
library(grid)
library(futile.logger)
library(VennDiagram)
venn.plot <- draw.triple.venn(area1 = 1005,
area2 = 927,
area3 = 943,
n12 = 843,
n23 = 861,
n13 = 866,
n123 = 794,
category = c("controls", "treatmentA", "treatmentB"),
fill = c("red", "yellow", "blue"),
cex = 2,
cat.cex = 2,
lwd = 6,
lty = 'dashed',
fontface = "bold",
fontfamily = "sans",
cat.fontface = "bold",
cat.default.pos = "outer",
cat.pos = c(-27, 27, 135),
cat.dist = c(0.055, 0.055, 0.085),
cat.fontfamily = "sans",
rotation = 1);