1

I have a dataset where data from participants (ID) were collected multiple times. Is there a way to combine their data by ID if they have the same variables? I'm hoping to collapse the data to have one row per ID.

Looks something like this but I'm working with about 100+ variables:

id<-c(1,1,1,1,2,2,2,3,3,3,4,4,4,4)
age<-c(19,NA,NA,NA,52,NA,NA,22,NA,NA,36,NA,NA,NA)
values<-c(9,9,9,9,9,8,8,8,8,6,8,7,7,7)
acceptance<-c(8,8,7,6,9,7,5,8,8,5,8,7,6,5)
diffusion<-c(9,8,7,6,9,9,9,9,8,5,9,8,8,5)
attitudes<-c(7,7,6,6,9,7,7,8,8,8,9,6,3,3)
df<-data.frame(id,age,values,acceptance,diffusion,attitudes)

         id  age  values  acceptance  diffusion  attitudes

 1       1   19   9         8          9           7 
 2       1   NA   9         8          8           7
 3       1   NA   9         7          7           6
 4       1   NA   9         6          6           6
 5       2   52   9         9          9           9
 6       2   NA   8         7          9           7
 7       2   NA   8         5          9           7
 8       3   22   8         8          8           8
 9       3   NA   8         8          5           8
10       3   NA   6         5          9           8
11       4   36   8         8          9           9
12       4   NA   7         7          8           6
13       4   NA   7         7          8           3
14       4   NA   7         5          5           3

These scores (values, acceptance, etc.) were collected before and a few times after an intervention, but I'm hoping to collapse all of them under one ID if that's possible.

Desired output:

         id  age  values   acceptance  diffusion  attitudes

 1       1   19   9,9,9,9   8,8,7,6    9,8,7,6     7,7,6,6 
 2       2   52   9,8,8     9,7,5      9,9,9       9,7,7
 3       3   22   8,8,6     8,8,5      8,5,9       8,8,8
 4       4   36   8,7,7,7   8,7,7,5    9,8,8,5     9,6,3,3

I haven't come across a post with a similar issue and I'm not sure if it's possible to do so on R so any help would be appreciated! Thank you!

n23
  • 89
  • 10
  • 1
    Please show the expected output – akrun Jul 27 '21 at 02:46
  • 1
    You've got 4 rows with ID 1, each with different values for `diffusion`. How do you want to combine them? Sum, mean, first value, last value, concatenate, .... ?? – Gregor Thomas Jul 27 '21 at 02:52
  • Sorry, I guess I should've been clearer. I edited the post and hopefully I make sense. – n23 Jul 27 '21 at 03:39
  • Possible duplicate of https://stackoverflow.com/questions/22756372/collapse-text-by-group-in-data-frame – akrun Jul 28 '21 at 01:36
  • Or another duplicate [here](https://stackoverflow.com/questions/59758011/collapse-columns-from-long-to-wide-format-with-data-table) – akrun Jul 28 '21 at 02:47
  • Does this answer your question? [Collapse / concatenate / aggregate a column to a single comma separated string within each group](https://stackoverflow.com/questions/15933958/collapse-concatenate-aggregate-a-column-to-a-single-comma-separated-string-w) – camille Dec 23 '21 at 20:19

2 Answers2

1

You can use dplyr grouping and summary methods to apply whatever collapsing/aggregation function you want. You can even pass in a generic f and adjust as needed:

library(dplyr)

f <- mean

df %>% 
  group_by(id) %>% 
  summarise(across(everything(), ~f(., na.rm = TRUE)))
andrew_reece
  • 20,390
  • 3
  • 33
  • 58
1

fill the NA values in age and for each id and age collapse the values with toString.

library(dplyr)
library(tidyr)

df %>%
  fill(age) %>%
  group_by(id, age) %>%
  summarise(across(.fns = toString), .groups = 'drop')

#    id   age values     acceptance diffusion  attitudes 
#  <dbl> <dbl> <chr>      <chr>      <chr>      <chr>     
#1     1    19 9, 9, 9, 9 8, 8, 7, 6 9, 8, 7, 6 7, 7, 6, 6
#2     2    52 9, 8, 8    9, 7, 5    9, 9, 9    9, 7, 7   
#3     3    22 8, 8, 6    8, 8, 5    9, 8, 5    8, 8, 8   
#4     4    36 8, 7, 7, 7 8, 7, 6, 5 9, 8, 8, 5 9, 6, 3, 3
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213