0

I feel like this should be straightforward but I can't find an existing answer to my question. I have a df:

df <- data.frame(ID = c('a', 'b', 'c', 'c1', 'd', 'e', 'f', 'g', 'h', 'h1'),
                 var2 = c(7, 9, 2, 4, 3, 6, 8, 2, 1, 2),
                 var3 = c(21, 50, 40, 30, 29, 45, 33, 51, 70, 46))

And I'd like to get the mean of every n rows for columns var2 and var3 separately, so that the output looks like this:

  var2 var3
1  8.0 35.5
2  3.0 35.0
3  4.5 37.0
4  5.0 42.0
5  1.5 58.0

It would be a bonus if I could keep the first ID of the two rows, e.g:

  ID var2 var3
1  a  8.0 35.5
2  c  3.0 35.0
3  d  4.5 37.0
4  f  5.0 42.0
5  h  1.5 58.0

Ty in advance

Loz
  • 137
  • 8

1 Answers1

0

We need to add a grouping column, and then this is a standard grouped mean:

library(dplyr)
n = 2
df |>
  mutate(group = ((row_number() - 1) %/% n) + 1) |>
  summarize(
    first_id = first(ID),
    across(starts_with("var"), mean),
    .by = group
  )
#   group first_id var2 var3
# 1     1        a  8.0 35.5
# 2     2        c  3.0 35.0
# 3     3        d  4.5 37.0
# 4     4        f  5.0 42.0
# 5     5        h  1.5 58.0
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • @Loz If you are running into grouping problems, you probably accidentally loaded `plyr` after `dplyr` and ignored the warning - as [in this R-FAQ](https://stackoverflow.com/q/26106146/903061). If so, you can specify `dplyr::summarize` to fix the issue. Another possibilty is that you have an version of `dplyr` before 1.1.0 (released Jan 29, 2023) that doesn't use the `.by` syntax for grouping - so you would need to insert `group_by(group)` after the `mutate` before the `summarize`. – Gregor Thomas Aug 08 '23 at 13:49
  • if I wanted to manually set the column names, how could I change this? – Loz Aug 08 '23 at 13:50
  • Sorry the grouping error was because i missed the 'n=2'. It works now! – Loz Aug 08 '23 at 13:50
  • Got it: across(c("var2", "var3"), mean) instead of across(starts_with("var"), mean) – Loz Aug 08 '23 at 13:51
  • You could replace `across(starts_with("var"), mean)` with `new_name = mean(var2), other_name = mean(var3)`. Or if you see the `?across` help page there are options for programmatically adjusting the names, like `across(starts_with("var"), mean, .names = "{.col}_mean_2")`. There are also many ways to select the column using any of the "select helpers". – Gregor Thomas Aug 08 '23 at 13:52