1

I'm trying to achieve a simple task of creating a subset of my dateframe (df) by calculating the mean from a variable with repeated measurement (measured multiple times a day, over several weeks). This variable is called "consumption" in my df

I followed this example here, and adapted the code to my df and my desired conditions: Calculate mean of column data based on conditions in another column

However, I went and calculated a few of the means by hand (using excel), and just get completely different results

Could someone point me in the right direction of where my code is going wrong?

I do have "0" as a few measurements, and they are important, and need to me included when calculating mean.

Here is a reproducible example:

df <- read.table("https://pastebin.com/raw/Zpa8cLBN", header = T)
library(dplyr)
df_mean <- df %>% group_by(treatment,day,Control) %>% summarise(
  consumption = first(consumption), consumption = last(consumption), consumption = mean(consumption[consumption >= 0]))
desired_results <- read.table("https://pastebin.com/raw/vZten0jd", header = T) # calculated manually in excel

When I compare the two, the results in the column "consumption", which should be the calculated mean, are not correct at all.

Thanks everyone

Andy
  • 413
  • 2
  • 15
  • 2
    You need to use different variable names in your `summarise`, because here you are modifying `consumption` each time you call it – jlesuffleur Jun 15 '20 at 10:14
  • 1
    Hello Thanks for the tip. I will post it as a response. I didn't realize using the same variable name would cause this issue. – Andy Jun 15 '20 at 10:27

2 Answers2

1

It appears that I need to use variables names for the summerisefunction that are different than the original df

df_mean <- df %>% group_by(treatment,day,Control) %>% summarise(
  Mean_consumption = first(consumption), Mean_consumption = last(consumption), Mean_consumption = mean(consumption[consumption >= 0]))

When cross referenced with my desired_results, it's what I was looking for.

Thanks @jlesuffleur

Andy
  • 413
  • 2
  • 15
1

We can use data.table

library(data.table)
setDT(df)[, .(Mean_consumption = first(consumption), Mean_consumptionlast = last(consumption), Mean_consumptionfilt = mean(consumption[consumption >= 0])), .(treatment, day, Control)]
akrun
  • 874,273
  • 37
  • 540
  • 662