2

I am currently struggling to get a mean value within a data frame.


The df is built like that:

'data.frame':   365 obs. of  5 variables:

$ Day      : chr  "01" "02" "03" "04" ...

$ Month    : Factor w/ 12 levels "01","02","03",..: 1 1 1 1 1 1 1 1 1 1 ...

$ Year     : chr  "2019" "2019" "2019" "2019" ...

$ XXX      : int  2 4 5 5 7 6 6 7 6 6 ...

$ Weekday  : Factor w/ 7 levels "Monday","Tuesday",..: 2 3 4 5 6 7 1 2 3 4 ...

I would like to get the mean for the value XXX but only for the first month (my_data$Month == "01"). I tried filtering it with dplyr but could not figure it out..

(For understanding: for each day there is one value in XXX, the df is for one entire year)

Can someone help? Would be much appreciated!

s_baldur
  • 29,441
  • 4
  • 36
  • 69
Eurastas
  • 23
  • 1
  • 3
  • 2
    Try `with(df, mean(XXX[Month == "01"]))` or using `dplyr` `df %>% filter(Month == "01") %>% summarise(Mean = mean(XXX))` – akrun Feb 10 '20 at 22:12
  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Feb 10 '20 at 22:18
  • akrun's answer will work. I assume if you are this new, his use of pipes is probably going to mess with you. Try filtering your dataframe using dplyr like this: firstMonthDF <- filter(df, Month=="01"). Then you can easily take the mean of the column. – Phil_T Feb 10 '20 at 23:20
  • @akrun how would you do it if it would be needed for several months with(df, mean(XXX[Month == c("01", "03")])) ? – Javier Hernando Jan 18 '23 at 14:14

1 Answers1

1

This R base solution should do it:

mean(df$XXX[df$month=="01"], na.rm = T)

Explanation:

You use the function meanto calculate the average of the variable XXXin your dataframe df, using the argument na.rm = Tto make sure missing values (NAs) are removed for that calculation, but subset the dataframe on those rows that have the value 01in column month so that the mean is calculated only on the corresponding values in column XXX.

EDIT:

Just in case you want to calculate the means not just for one month but all months, you could do this in one go using aggregate:

aggregate(month ~ XXX, data = df, mean)
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • Thank you! This worked. Thanks as well for the further explanation and addition regarding the aggregate function.Now I can finally dig deeper into the project, much appreciated! :) – Eurastas Feb 11 '20 at 21:19
  • Glad it worked. Do feel free to also click the upward arrow if the answer was useful to your question. – Chris Ruehlemann Feb 11 '20 at 21:24