1

I have a dataframe df with plot names and values for species found on those plots:

df=data.frame(plot=c(1000, 1000, 1000, 1005, 1005, 1005, 1009, 1009, 1009), speciesA=c(5, 0.5, 10, 7, 8, 45, 0.2, 3, 17), speciesB = c(1, 11, 46, 98, 0.2, 14, 40, 37, 22), speciesC = c(0.7, 72, 17, 0, 14, 8, 0, 9, 0.9))

Now I want to find the mean for each species for each plot, resulting in a dataframe df2 like:

df2=data.frame(plot=c(1000,1005,1009), speciesA=c(5.166, 20, 20.2), speciesB=c(19.333, 37.4, 33), speciesC=c(89.7, 7.333, 3.3))

I've tried:

df2 <- df %>% group_by(plot) %>% summarise_each(funs(mean))

Which just made the program completely unresponsive. And I know that I can collapse the rows using:

df2 <- df[, lapply(.SD, paste0, collapse=""), by=plot]

But this just collapses all of the numbers together, I'm not sure how to use it to calculate mean. I've already tried searching here for answers but I apologize if this is still a duplicate question.

tk3
  • 990
  • 1
  • 13
  • 18
Kactus
  • 142
  • 1
  • 11
  • Try `df2 <- df[, lapply(.SD, mean), by=plot]` and make sure you are using a `data.table` instead of a `data.frame` – AshOfFire May 16 '18 at 12:59
  • change `summarise_each` to `summarise_all` like `df %>% group_by(plot) %>% summarise_all(funs(mean))` as summarise_each is deprecated and will disappear in the future. That said, at the moment there is no reason why it shouldn't work. It works on my machine. – phiver May 16 '18 at 13:02
  • At what point is size going to be an issue running this? The data.table that I'm working with is about 1000 observations and 1200 variables. When I try to run either of these I don't get an error, but R just become unresponsive and I have to restart. – Kactus May 16 '18 at 13:21

0 Answers0