2

I have a dataframe df as follows (my real data there are many columns):

  df <- read.table(text = "date    hfgf    lmo
       2019-01-01        0.7         1.4
       2019-02-01        0.11        2.3
       2019-03-01       1.22        6.7
       2020-04-01        0.44        5.2
       2020-05-01        0.19        2.3
       2021-06-01        3.97        9.5
       ,
             header = TRUE, stringsAsFactors = FALSE)

I would like to replace the monthly values in the columns value 1 value 2 etc by the yearly mean.

Note that I can melt and use summarize function but I need to keep the columns as they are.

bic ton
  • 1,284
  • 1
  • 9
  • 16
  • [Can dplyr summarise over several variables without listing each one?](https://stackoverflow.com/questions/21295936/can-dplyr-summarise-over-several-variables-without-listing-each-one) – Henrik Dec 05 '21 at 19:05

3 Answers3

2

If we want to update the columns with the yearly mean, do a grouping by the year extracted 'date' and use mutate to update the columns with the mean of those columns by looping across

If it is to return a single mean row per 'year', use summarise

library(lubridate)
library(dplyr)
df %>%
    group_by(year = year(date)) %>% 
     summarise(across(where(is.numeric), mean, na.rm = TRUE))

-output

# A tibble: 3 × 3
   year  hfgf   lmo
  <dbl> <dbl> <dbl>
1  2019 0.677  3.47
2  2020 0.315  3.75
3  2021 3.97   9.5 
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @bicton you are saying different things If you have only the 'date' column and want the mean of all numeric columns `summarise(across(where(is.numeric), mean, na.rm = TRUE))` – akrun Dec 05 '21 at 19:03
  • @bicton if there are columns that you want to get the mean of but exclude some other columns, and there is no pattern, then you have to use as a vector i.e. c(hghf, lmo, ..)` – akrun Dec 05 '21 at 19:04
1

Here is a base R solution with aggregate.

res <- aggregate(cbind(hfgf, lmo) ~ format(df$date, "%Y"), df, mean)
names(res)[1] <- names(df)[1]

res
#  date      hfgf      lmo
#1 2019 0.6766667 3.466667
#2 2020 0.3150000 3.750000
#3 2021 3.9700000 9.500000
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1

I am not sure but maybe you mean this kind of solution. In essence it is same as akrun's solution: Here with mutate, alternatively you could use the outcommented summarise:

library(lubridate)          
library(dplyr)

df %>% 
  group_by(year = year(date)) %>% 
  mutate(across(c(hfgf, lmo), mean, na.rm=TRUE, .names = "mean_{unique(year)}_{.col}"))
 # summarise(across(c(hfgf, lmo), mean, na.rm=TRUE, .names = "mean_{unique(year)}_{.col}"))
 date    hfgf   lmo  year mean_2019_hfgf mean_2019_lmo mean_2020_hfgf mean_2020_lmo mean_2021_hfgf mean_2021_lmo
  <chr>  <dbl> <dbl> <dbl>          <dbl>         <dbl>          <dbl>         <dbl>          <dbl>         <dbl>
1 2019-~  0.7    1.4  2019          0.677          3.47         NA             NA             NA             NA  
2 2019-~  0.11   2.3  2019          0.677          3.47         NA             NA             NA             NA  
3 2019-~  1.22   6.7  2019          0.677          3.47         NA             NA             NA             NA  
4 2020-~  0.44   5.2  2020         NA             NA             0.315          3.75          NA             NA  
5 2020-~  0.19   2.3  2020         NA             NA             0.315          3.75          NA             NA  
6 2021-~  3.97   9.5  2021         NA             NA            NA             NA              3.97           9.5
TarJae
  • 72,363
  • 6
  • 19
  • 66