-2

I want to generate a vector of means derived from subsets of an existing vector in R.

My data look like this:

date    plant_ID    treatment   stalk_count flower_count
195     1           control     0           0
196     1           control     0           0
197     1           control     0           0
198     1           control     0           0
.........................................................
237     98          treatment   0           0
239     98          treatment   0           0
226     98          treatment   2           9 

I think I need to use split() to break the data into subsets by plant_ID, but do not know how to tell lapply() to take these subsets, and apply the mean() function to the flower_count data contained within each subset.

My questions are: 1- Is this an approach that will work? 2- How would I write the code to do this?

JKO
  • 295
  • 1
  • 12

1 Answers1

-1

We don't need to split, it is possible to get the mean of the 'flower_count' by a group by operation with aggregate from base R

aggregate(flower_count~plant_ID, df1, FUN = mean)

Or using dplyr

library(dplyr)
df1 %>%
   group_by(plant_ID) %>%
   summarise(flowercountMean = mean(flower_count))

If we want to specifically use lapply with split

lapply(split(df1$flower_count, df1$plant_ID), mean)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you akrun, that's quite helpful. I'm not able to get it to run yet however. What does the "df1" term mean? – JKO Jan 23 '17 at 20:04
  • @JKO Suppose you read the dataset `df1 <- read.csv("yourdata.csv", header=TRUE, stringsAsFactors=FALSE)` then that `df1` is the df1 mentioned in my answer. – akrun Jan 23 '17 at 20:05
  • 1
    Ah, great, thank you very much! – JKO Jan 23 '17 at 20:07