0

I have the data in below format

ID  ID_2    Item ID
001 111 1111
001 111 1112
001 112 1113
001 112 1114
001 112 1115
001 112 1116
001 113 1117
008 222 1118
008 222 1119
008 223 1120
011 333 1121
012 444 1122
012 444 1123
012 444 1124
012 444 1125
012 444 1126
017 555 1127
017 555 1128
017 555 1129

but i want to convert item_ID into multiple variables based on the order they appear in ID_2 and ID_1.Somewhat like what i have mentioned below

ID  ID_2    Item_1  Item_2  Item_3  Item_4  Item_5
001 111 1111    1112            
001 112 1113    1114    1115    1116    
001 113 1117                
008 222 1118    1119            
008 223 1120                
011 333 1121                
012 444 1122    1123    1124    1125    1126
017 555 1127    1128    1129        

I have tried Concatenating ID and ID_2, then create a key number 1,2.. so that the key changes as soon as the Concatenated value changes. But I have 38L rows of data it keeps on going.

I would appreciate any help from a person who has dealt with similar problem before.Thanks

Adam Quek
  • 6,973
  • 1
  • 17
  • 23

3 Answers3

1

Using dplyr and tidyr packages:

dat %>% 
    group_by(ID, ID_2) %>% 
    mutate(item = paste0("Item_", 1:n())) %>% 
    spread(item, Item_ID)

Source: local data frame [8 x 7]
Groups: ID, ID_2 [8]

     ID  ID_2 Item_1 Item_2 Item_3 Item_4 Item_5
* <int> <int>  <int>  <int>  <int>  <int>  <int>
1     1   111   1111   1112     NA     NA     NA
2     1   112   1113   1114   1115   1116     NA
3     1   113   1117     NA     NA     NA     NA
4     8   222   1118   1119     NA     NA     NA
5     8   223   1120     NA     NA     NA     NA
6    11   333   1121     NA     NA     NA     NA
7    12   444   1122   1123   1124   1125   1126
8    17   555   1127   1128   1129     NA     NA
Adam Quek
  • 6,973
  • 1
  • 17
  • 23
1

We can use dcast from data.table

library(data.table)
dcast(setDT(df1), ID + ID_2 ~ paste0("Item_", rowid(ID, ID_2)), value.var = "ItemID")
#    ID ID_2 Item_1 Item_2 Item_3 Item_4 Item_5
#1:  1  111   1111   1112     NA     NA     NA
#2:  1  112   1113   1114   1115   1116     NA
#3:  1  113   1117     NA     NA     NA     NA
#4:  8  222   1118   1119     NA     NA     NA
#5:  8  223   1120     NA     NA     NA     NA
#6: 11  333   1121     NA     NA     NA     NA
#7: 12  444   1122   1123   1124   1125   1126
#8: 17  555   1127   1128   1129     NA     NA
akrun
  • 874,273
  • 37
  • 540
  • 662
1

A little long way using str_split_fixed and str_count from stringr library can be following:

library("stringr")

df1<- aggregate( Item_ID ~ ID_2 + ID ,data=df,paste0,collapse =",")
maxl <- max(str_count(df1$Item_ID,","))+1
splitcols <- str_split_fixed(df1$Item_ID,pattern=",",n=maxl)
setNames(data.frame(df1[,2:1],splitcols),c("ID","ID_2",paste0("Item_",1:maxl)))

Output:

#    ID ID_2 Item_1 Item_2 Item_3 Item_4 Item_5
# 1  1  111   1111   1112                     
# 2  1  112   1113   1114   1115   1116       
# 3  1  113   1117                            
# 4  8  222   1118   1119                     
# 5  8  223   1120                            
# 6 11  333   1121                            
# 7 12  444   1122   1123   1124   1125   1126
# 8 17  555   1127   1128   1129              
# > 
PKumar
  • 10,971
  • 6
  • 37
  • 52