I'm trying to achieve a simple task of creating a subset of my dateframe (df) by calculating the mean from a variable with repeated measurement (measured multiple times a day, over several weeks). This variable is called "consumption" in my df
I followed this example here, and adapted the code to my df and my desired conditions: Calculate mean of column data based on conditions in another column
However, I went and calculated a few of the means by hand (using excel), and just get completely different results
Could someone point me in the right direction of where my code is going wrong?
I do have "0" as a few measurements, and they are important, and need to me included when calculating mean.
Here is a reproducible example:
df <- read.table("https://pastebin.com/raw/Zpa8cLBN", header = T)
library(dplyr)
df_mean <- df %>% group_by(treatment,day,Control) %>% summarise(
consumption = first(consumption), consumption = last(consumption), consumption = mean(consumption[consumption >= 0]))
desired_results <- read.table("https://pastebin.com/raw/vZten0jd", header = T) # calculated manually in excel
When I compare the two, the results in the column "consumption", which should be the calculated mean, are not correct at all.
Thanks everyone