-1

I have a dataframe that looks like this:

ID   X   Y

 1   A   A
 1   B   A
 2   C   A
 3   A   K
 3   A   A

All of the solutions used for repeated measurements etc. that I could find so far rely on one indicator that varies across all observations but in my case I do not have that. I basically want to summarize all of the observations in row for every ID in the data set.

In the end it should look like this in wide format

ID   X1   Y1   X2   Y2   

1    A    A    B    A 
2    C    A    NA   NA #since there is only one observation for 2 the values for X2 and Y2 should be NA
3    A    K    A    A

Any idea how to do this? I would prefer to use data.table if possible.

PCK1992
  • 213
  • 1
  • 14

2 Answers2

1
library(data.table)
setDT(df)

melt(df, 1)[, i := paste(variable, 1:.N, sep = "_"), 
                keyby = .(ID, variable)][, dcast(.SD, ID ~ i), 
                                             .SDcols = c("ID", "value", "i")]

>    ID X_1  X_2 Y_1  Y_2
  1:  1   A    B   A    A
  2:  2   C <NA>   A <NA>
  3:  3   A    A   K    A

There you have:

  • first you melt the data, so you have all X and Y values under one column
  • Then you create a new variable that tells you if that's the first or second X or Y, grouped by ID and variable (so they're meaningful)
  • Then you cast to wide that table, keeping ID as a column, and the new variable as column headers. You dropped the column variable, as you already had it coded in i.
PavoDive
  • 6,322
  • 2
  • 29
  • 55
1

We can use rowid to create a unique ID over which you can cast the data to wide.

library(data.table)
dcast(setDT(df), ID~rowid(ID), value.var = c('X', 'Y'))

#   ID X_1  X_2 Y_1  Y_2
#1:  1   A    B   A    A
#2:  2   C <NA>   A <NA>
#3:  3   A    A   K    A

data

df <- structure(list(ID = c(1L, 1L, 2L, 3L, 3L), X = c("A", "B", "C", 
"A", "A"), Y = c("A", "A", "A", "K", "A")), 
class = "data.frame", row.names = c(NA, -5L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213