0

this is probably trivial, but my data looks like this:

t <- structure(list(var = 1:5, ID = c(1, 2, 1, 1, 3)), class = "data.frame", row.names = c(NA,
-5L))
> t
  var ID
1   1  1
2   2  2
3   3  1
4   4  1
5   5  3

I would like to get a mean value for each ID, so my idea was to transform them into this (variable names are not important):

f <- structure(list(ID = 1:3, var.1 = c(1, 2, 5), var.2 = c(2, NA,
NA), var.3 = c(3, NA, NA)), class = "data.frame", row.names = c(NA,
-3L))
> f
  ID var.1 var.2 var.3
1  1     1     2     3
2  2     2    NA    NA
3  3     5    NA    NA

so that I could then calculate the mean for each var.x.

I know it's possible with tidyr (possibly pivot_wider?), but I can't figure out how to group it. How do I get a mean value for each ID?

Thank you in advance

Giulio Centorame
  • 678
  • 4
  • 19
  • What are you taking the mean of here? Are you taking a mean of the var values? But you seem to be tuning those var values into columns so the mean of var 1 will be 1 by definition. If you just want mean per ID then `t %>% group_by(ID) %>% summarize(mean(var))` is probably what you want. – MrFlick Feb 25 '22 at 02:35
  • 1
    Beware of using reserved names like `t` for object names! – jay.sf Feb 25 '22 at 05:39

2 Answers2

1

You could use ave to get the mean of var for each ID:

t$mean = ave(t$var, t$ID, FUN = mean)

Result:

  var ID     mean
1   1  1 2.666667
2   2  2 2.000000
3   3  1 2.666667
4   4  1 2.666667
5   5  3 5.000000

If you want a simple table with the means, you could use aggregate:

aggregate(formula = var~ID, data = t, FUN = mean)

  ID      var
1  1 2.666667
2  2 2.000000
3  3 5.000000 
R. Schifini
  • 9,085
  • 2
  • 26
  • 32
  • The first option does work, my only worry with `ave()` is that it would take ages with too many data points (I'd have to scale it up to several thousand observations, hence the intermediate object I am creating) – Giulio Centorame Feb 25 '22 at 05:02
0

If you want to use rowMeans on your t dataframe, then we can first use pivot_wider, then get the mean of the row.

library(tidyverse)

t %>% 
  group_by(ID) %>% 
  mutate(row = row_number()) %>% 
  ungroup %>% 
  pivot_wider(names_from = row, values_from = var, names_prefix = "var.") %>% 
  mutate(mean = rowMeans(select(., starts_with("var")), na.rm = TRUE))

#      ID var.1 var.2 var.3  mean
#   <dbl> <int> <int> <int> <dbl>
# 1     1     1     3     4  2.67
# 2     2     2    NA    NA  2   
# 3     3     5    NA    NA  5   

Or since t is in long form, then we can just group by ID, then get the mean for all values in that group.

t %>% 
  group_by(ID) %>% 
  summarise(mean = mean(var))

#     ID  mean
#  <dbl> <dbl>
#1     1  2.67
#2     2  2   
#3     3  5   

Or for f, we can use rowMeans for each row that will include any column that starts with var.

f %>% 
   mutate(mean = rowMeans(select(., starts_with("var")), na.rm = TRUE))

#  ID var.1 var.2 var.3 mean
#1  1     1     2     3    2
#2  2     2    NA    NA    2
#3  3     5    NA    NA    5
AndrewGB
  • 16,126
  • 5
  • 18
  • 49