1

I have a data frame in the following fashion :

Year <- 1948:2017
Jan<- rnorm(70)
Feb<- rnorm(70)
Mar<- rnorm(70)
Apr<- rnorm(70)
May<- rnorm(70)
Jun<- rnorm(70)
Jul<- rnorm(70)
Aug<- rnorm(70)
Sep<- rnorm(70)
Oct<- rnorm(70)
Nov<- rnorm(70)
Dec<- rnorm(70)
test_df <- cbind.data.frame(Year, Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec)
head(test_df)
########Console result


    Year        Jan        Feb        Mar         Apr
1 1948 -0.5918300  0.0497792 -0.9302350  0.73162688
2 1949 -1.2731259  0.8933090  0.2340527  1.03077077
3 1950 -0.3727786 -0.5680272  1.4439980  0.53150414
4 1951  0.6520741 -1.4229818 -0.9700416 -0.07151535
5 1952  0.4296101 -0.2294352  1.0863566  1.58652232
6 1953  0.3334147 -0.5386016  1.3432490  1.91005906
          May        Jun         Jul         Aug
1  0.28268233  0.7870373 -0.06178119 -0.14469371
2 -0.02048683 -1.4834607 -0.17926819 -0.38662117
3  0.24659095  0.4929837  0.79430914  0.03486687
4 -0.60123934  1.1304690 -0.13452649 -1.07814801
5  1.39161546  0.6827090  0.54729206  0.50188908
6 -0.53882956 -0.3246258  0.09602686 -2.35509441
         Sep        Oct        Nov         Dec
1  2.0492817  0.6185466  2.0427045 -0.06097253
2  0.7804505 -0.3416864 -1.5192509  2.01911948
3  1.9193976 -0.3120360  1.5646020 -0.04911313
4 -0.1147404 -0.3593639  0.5186583  1.39936930
5  2.4481574 -1.2349037 -0.3519640  0.58429371
6  0.6639531 -0.4471403  0.7071486 -1.02036467

I require to group random months such as JanFeb, JanMar or AprFeb or MarMayNov, like so. The grouping of months could be anything (Many number of possibilities and combinations). And when I group this months their values should be averages as for example, JanFeb value should be the mean of the values of Jan and Feb or MarMayNov value should be the mean of Mar, Nov and May. How to approach this problem? Any help is appreciated. Thanks.

Edit

Lets say for simplicity that I only want to group 2 months or 3 months at most not more than that.

Sayantan4796
  • 169
  • 1
  • 10
  • So the highest possible grouping would be ```JanFebMarApr....Dec```? Right? Also, if you don't need to group them by year, then you can just take the average for each month and reduce the number of rows to 1. – Shibaprasadb Sep 29 '21 at 06:41
  • That can be done with `rowMeans()` as well. What if I only want to group 2 or 3 months at most? – Sayantan4796 Sep 29 '21 at 06:49

1 Answers1

3

We can create all possible combinations of names using lapply and combn. For each combination find the average of selected columns in one column and combine such columns together in one dataframe.

cols <- names(test_df)[-1]

result <- do.call(cbind, lapply(2:length(cols), function(x)
  do.call(cbind, combn(cols, x, function(y) 
    setNames(data.frame(rowMeans(test_df[y])), 
              paste0(y, collapse = "")), simplify = FALSE))))

If you want to combine only 3 months at most, change 2:length(cols) to 2:3 in lapply.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 2
    Is there a bug in your code? The 1st `lapply` calls `function(x)` but `x` is never used and `combn` only combines 2 `cols`. Is `x` intended to be where 2 is? – Rui Barradas Sep 29 '21 at 06:53
  • 3
    Yes, that is correct. Thanks for pointing that out @RuiBarradas – Ronak Shah Sep 29 '21 at 06:54