I've looked for an answer but I still have not found a solution, I'm still new to R. My data frame displays measurement of one ecological trait (relative soil coverage) for ~70 plant species in different conditions: different years, different chemical treatment and presence/absence of greenhouse.
I need to summarize that data into a new dataframe that displays the mean and sd of the trait for each species and for each combination of factors (conditions). I know aggregate
or lapply
could help but i struggle to combine both grouping by 3 different factors and the multiples species, which implies the need of an "automated" code.
I'm sorry if I missed a post answering my question
Thank you for your patience and help
Edit: here is a reproducible example, hope i do that correctly:
mydata<-structure(list(Year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
2011L), Replicate = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L), Treatment = structure(c(1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("A",
"B"), class = "factor"), Greenhouse = structure(c(2L, 2L, 1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("No",
"Yes"), class = "factor"), Sp_1 = c(4L, 0L, 2L, 5L, 4L, 0L, 2L,
5L, 0L, 0L, 4L, 6L, 4L, 0L, 2L, 5L), Sp_2 = c(7L, 0L, 1L, 1L,
7L, 0L, 1L, 1L, 7L, 0L, 1L, 1L, 6L, 0L, 1L, 1L), Sp_3 = c(8L,
2L, 2L, 1L, 8L, 2L, 2L, 1L, 10L, 2L, 1L, 1L, 4L, 2L, 2L, 1L)), class = "data.frame", row.names = c(NA,
-16L))
I only put 3 species in that example but as i've said i have over 70 species so i need something that could select all the species columns ( mydata[,5:75]
? something along those lines) more than c("sp_1","sp_2",..., "sp_70")
.
I'd like the output to look like this:
Year Treatment Greenhouse Sp_1_mean Sp_1_sd Sp_2_mean Sp_2_sd
2010 A Yes x x x x
2010 A No x x x x
2010 B Yes x x x x
2010 B No x x x x
2011 A Yes x x x x
2011 A No x x x x
2011 B Yes x x x x
2011 B No x x x x
Here is a dput()
showing what the desired output should look like
desired_output<-structure(list(Year = c(2010L, 2010L, 2010L, 2010L, 2011L, 2011L,
2011L, 2011L), Treatment = structure(c(1L, 1L, 2L, 2L, 1L, 1L,
2L, 2L), .Label = c("A", "B"), class = "factor"), Greenhouse = structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("No", "Yes"), class = "factor"),
Sp_1_mean = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "x", class = "factor"),
Sp_1_sd = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "x", class = "factor"),
Sp_2_mean = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "x", class = "factor"),
Sp_2_sd = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "x", class = "factor"),
Sp_3_mean = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "x", class = "factor"),
Sp_3_sd = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "x", class = "factor")), class = "data.frame", row.names = c(NA,
-8L))
I hope that's clearer! Thanks