Combine data by ID in R

Question

I have a dataset where data from participants (ID) were collected multiple times. Is there a way to combine their data by ID if they have the same variables? I'm hoping to collapse the data to have one row per ID.

Looks something like this but I'm working with about 100+ variables:

id<-c(1,1,1,1,2,2,2,3,3,3,4,4,4,4)
age<-c(19,NA,NA,NA,52,NA,NA,22,NA,NA,36,NA,NA,NA)
values<-c(9,9,9,9,9,8,8,8,8,6,8,7,7,7)
acceptance<-c(8,8,7,6,9,7,5,8,8,5,8,7,6,5)
diffusion<-c(9,8,7,6,9,9,9,9,8,5,9,8,8,5)
attitudes<-c(7,7,6,6,9,7,7,8,8,8,9,6,3,3)
df<-data.frame(id,age,values,acceptance,diffusion,attitudes)

         id  age  values  acceptance  diffusion  attitudes

 1       1   19   9         8          9           7 
 2       1   NA   9         8          8           7
 3       1   NA   9         7          7           6
 4       1   NA   9         6          6           6
 5       2   52   9         9          9           9
 6       2   NA   8         7          9           7
 7       2   NA   8         5          9           7
 8       3   22   8         8          8           8
 9       3   NA   8         8          5           8
10       3   NA   6         5          9           8
11       4   36   8         8          9           9
12       4   NA   7         7          8           6
13       4   NA   7         7          8           3
14       4   NA   7         5          5           3

These scores (values, acceptance, etc.) were collected before and a few times after an intervention, but I'm hoping to collapse all of them under one ID if that's possible.

Desired output:

         id  age  values   acceptance  diffusion  attitudes

 1       1   19   9,9,9,9   8,8,7,6    9,8,7,6     7,7,6,6 
 2       2   52   9,8,8     9,7,5      9,9,9       9,7,7
 3       3   22   8,8,6     8,8,5      8,5,9       8,8,8
 4       4   36   8,7,7,7   8,7,7,5    9,8,8,5     9,6,3,3

I haven't come across a post with a similar issue and I'm not sure if it's possible to do so on R so any help would be appreciated! Thank you!

You've got 4 rows with ID 1, each with different values for `diffusion`. How do you want to combine them? Sum, mean, first value, last value, concatenate, .... ?? — Gregor Thomas, Jul 27 '21 at 02:52
Sorry, I guess I should've been clearer. I edited the post and hopefully I make sense. — n23, Jul 27 '21 at 03:39
Possible duplicate of https://stackoverflow.com/questions/22756372/collapse-text-by-group-in-data-frame — akrun, Jul 28 '21 at 01:36
Or another duplicate [here](https://stackoverflow.com/questions/59758011/collapse-columns-from-long-to-wide-format-with-data-table) — akrun, Jul 28 '21 at 02:47
Does this answer your question? [Collapse / concatenate / aggregate a column to a single comma separated string within each group](https://stackoverflow.com/questions/15933958/collapse-concatenate-aggregate-a-column-to-a-single-comma-separated-string-w) — camille, Dec 23 '21 at 20:19

score 1 · Answer 1 · answered Jul 27 '21 at 03:01

You can use dplyr grouping and summary methods to apply whatever collapsing/aggregation function you want. You can even pass in a generic f and adjust as needed:

library(dplyr)

f <- mean

df %>% 
  group_by(id) %>% 
  summarise(across(everything(), ~f(., na.rm = TRUE)))

score 1 · Accepted Answer · answered Jul 27 '21 at 05:43

fill the NA values in age and for each id and age collapse the values with toString.

library(dplyr)
library(tidyr)

df %>%
  fill(age) %>%
  group_by(id, age) %>%
  summarise(across(.fns = toString), .groups = 'drop')

#    id   age values     acceptance diffusion  attitudes 
#  <dbl> <dbl> <chr>      <chr>      <chr>      <chr>     
#1     1    19 9, 9, 9, 9 8, 8, 7, 6 9, 8, 7, 6 7, 7, 6, 6
#2     2    52 9, 8, 8    9, 7, 5    9, 9, 9    9, 7, 7   
#3     3    22 8, 8, 6    8, 8, 5    9, 8, 5    8, 8, 8   
#4     4    36 8, 7, 7, 7 8, 7, 6, 5 9, 8, 8, 5 9, 6, 3, 3

Combine data by ID in R

2 Answers2