I would like to perform pairwise comparisons (using t tests) between each species in the iris dataset to see which species differ significantly in which variables. That is, each pairwise comparison would compare all measurement values of one species in a given variable against all measurement values of another species in the same variable. Listed below are all possible pairwise comparisons with the iris dataset.
data(iris)
setosa.only <- iris[iris$Species == "setosa", ]
versicolor.only <- iris[iris$Species == "versicolor", ]
virginica.only <- iris[iris$Species == "virginica", ]
# setosa vs versicolor
t.test(setosa.only$Sepal.Length, versicolor.only$Sepal.Length)
t.test(setosa.only$Sepal.Width, versicolor.only$Sepal.Width)
t.test(setosa.only$Petal.Length, versicolor.only$Petal.Length)
t.test(setosa.only$Petal.Width, versicolor.only$Petal.Width)
# setosa vs virginica
t.test(setosa.only$Sepal.Length, virginica.only$Sepal.Length)
t.test(setosa.only$Sepal.Width, virginica.only$Sepal.Width)
t.test(setosa.only$Petal.Length, virginica.only$Petal.Length)
t.test(setosa.only$Petal.Width, virginica.only$Petal.Width)
# versicolor vs virginica
t.test(versicolor.only$Sepal.Length, virginica.only$Sepal.Length)
t.test(versicolor.only$Sepal.Width, virginica.only$Sepal.Width)
t.test(versicolor.only$Petal.Length, virginica.only$Petal.Length)
t.test(versicolor.only$Petal.Width, virginica.only$Petal.Width)
Such pairwise comparisons are easy to perform one by one with a small dataset such as iris (which has only 12 possible comparisons), but I would like to apply this to larger datasets with dozens of species and variables (and thus hundreds of possible comparisons). How could I do the above comparisons with a single or a few commands to apply them to larger datasets? With limited knowledge of the R language, I have not been able to figure out how to do this and would be grateful if anyone has suggestions.
In addition, I woud like to get an output summarizing all pairwise comparisons. It could be a matrix with TRUE or FALSE (or something equivalent like 1/0 or Y/N) indicating which species differ significantly in which variables (i.e., TRUE indicating species pairs that met the t test, considering p = 0.05). Such a matrix may be difficult to interpret if it contains all species and all variables simultaneously, thus it could be one matrix per variable. For example, the desired output matrix resulting from the comparisons of Sepal.Length would be something like:
setosa versicolor virginica
setosa NA YES YES
versicolor YES NA YES
virginica YES YES NA
Alternatively, the output could be an array like the one which returns when calling the code below:
tapply(X = iris$Sepal.Length, INDEX = iris$Species, FUN = summary)