3

I have a table as follows:

df <- data.table(ID = c(1,2,3,1,2,1,1,2,3),
                 vlaue = c("A", "B", "C", "A", "B", "A", "A", "B", "C"))

> df
      ID vlaue
  1:  1     A
  2:  2     B
  3:  3     C
  4:  1     A
  5:  2     B
  6:  1     A
  7:  1     A
  8:  2     B
  9:  3     C

I need to create a second table from this where the columns are the unique ID values and the rows are the corresponding value column like this with NAs where there is missing data,

> df2
     1 2 3
  1: A B C
  2: A B  
  3: A    
  4: A B C

I tried reshape, dcast, played with a matrix but nothing produces the table as I want.

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
Andrew Scotchmer
  • 313
  • 3
  • 12

2 Answers2

3

In data.table:

library(data.table)
dcast(df[, gp := cumsum(ID == 1)], gp ~ ID, value.var = 'vlaue')
#    gp 1    2    3
# 1:  1 A    B    C
# 2:  2 A    B <NA>
# 3:  3 A <NA> <NA>
# 4:  4 A    B    C

In dplyr:

library(tidyr)
library(dplyr)

df %>% 
  mutate(gp = cumsum(ID == 1)) %>% 
  pivot_wider(names_from = "ID", values_from = "vlaue") %>% 
  select(-gp)
Maël
  • 45,206
  • 3
  • 29
  • 67
1

for data.table

If df is already a data.table object, we can use dcast as below as well

> dcast(df, rowid(ID) ~ ID, value.var = "vlaue")[, -1]
   1    2    3
1: A    B    C
2: A    B    C
3: A    B <NA>
4: A <NA> <NA>

for data.frame

If df is a data.frame (rather than a data.table), you can try the dplyr approach by @Maël, or a base R option with reshape

reshape(
  transform(
    df,
    row = ave(ID, ID, FUN = seq_along)
  ),
  direction = "wide",
  idvar = "row",
  timevar = "ID"
)[-1]

which gives

  vlaue.1 vlaue.2 vlaue.3
1       A       B       C
4       A       B       C
6       A       B    <NA>
7       A    <NA>    <NA>
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81