1

I want to add a mean of Temp per month as a column to the airquality dataset. So, I want something like this:

  Ozone Solar.R  Wind  Temp Month  Day NEW COLUMN

  41     190   7.4    67     5     1  77.9
  36     118   8      72     5     2  77.9
  12     149  12.6    74     5     3  77.9
  18     313  11.5    62     5     4  77.9
  NA      NA  14.3    56     5     5  77.9
  28      NA  14.9    66     5     6  77.9

Where the new column is a mean of Temp/month. So, it will repeat the mean of Temp in the rows where Month=5, then another mean of Temp where Month=6 etc.

I've tried this:

 airquality %>% mutate(col = sapply(split(Temp, Month), min))

But I get an error saying that this renders 5 rows, while my dataframe has 153.

How do I solve this in an elegant way?

  • Possible duplicate of [Calculate group mean (or other summary stats) and assign to original data](https://stackoverflow.com/questions/6053620/calculate-group-mean-or-other-summary-stats-and-assign-to-original-data) – camille Sep 27 '19 at 17:19

1 Answers1

1

Instead of split, use group_by with 'Month' and get the min of 'Temp' in mutate. The min returns a numeric value of length 1, which would be recycled to fill the entire rows of each group

library(dplyr)
airquality %>%
    group_by(Month) %>%
    dplyr::mutate(col = min(Temp))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Unfortunately this prints the same number all through the column :( – polarsandwich Sep 27 '19 at 16:46
  • @polarsandwich Please specify `dplyr::mutate` as you have loaded `plyr` package too and there is a `mutate` in that package which masks the behavior of this mutate. Or another option is to restart the `R` only load `library(dplyr)` and then run the script – akrun Sep 27 '19 at 16:47