3

I have a data frame that looks like this:

ID  X1  X2  X3  X4  X5  X6  X7  X8
202 1   0   895 114 17  21  1   4
202 2   0   130 399 74  19  4   4
202 3   0   364 112 48  12  5   4
202 4   4   104 012 83  81  0   4
203 1   0   895 112 76  49  1   5
203 2   2   950 815 32  35  4   5
203 3   0   3.4 156 69  14  5   5
203 4   0   868 025 71  20  0   5
204 2   0   801 398 51  44  4   8
204 4   4   205 000 14  24  0   8

I want to put data corresponding to an ID in one row. For an ID, there are different numbers of rows which are mentioned by "X1" column, "X8" column is same for an ID, and "X2" contains only one non-zero value and I am interested only in this value. If values are not available for new columns, it can be set to 999. So, I want to finally look like:

ID  X8  X2  X3_1    X4_1    X5_1    X6_1    X7_1    X3_2    X4_2    X5_2    X6_2    X7_2    X3_3    X4_3    X5_3    X6_3    X7_3    X3_4    X4_4    X5_4    X6_4    X7_4                                                                            
202 4   4   895     114     17      21      1       130     399     74      19      4       364     112     48      12      5       104     12      83      81      0
203 5   2   895     112     76      49      1       950     815     32      35      4       3.4     156     69      14      5       868     25      71      20      0
204 8   4   999     999     999     999     999     801     398     51      44      4       999     999     999     999     999     205     0       14      24      0

I wish to use R for this. Thanks in advance for the help.

3 Answers3

2

Or with reshape2:

library(reshape2)
> df.melt = melt(df, id.vars =c("ID", "X1","X2", "X8"))
> df.cast = dcast(df.melt, ID + X8 ~variable + X1 , fill = 999)
> df.cast
   ID X8 X3_1 X3_2 X3_3 X3_4 X4_1 X4_2 X4_3 X4_4 X5_1 X5_2 X5_3 X5_4 X6_1 X6_2 X6_3 X6_4 X7_1 X7_2 X7_3 X7_4
1 202  4  895  130  364  104  114  399  112   12   17   74   48   83   21   19   12   81    1    4    5    0
2 203  5  895  950  430  868  112  815  156   25   76   32   69   71   49   35   14   20    1    4    5    0
3 204  8  999  801  999  205  999  398  999    0  999   51  999   14  999   44  999   24  999    4  999    0

merging X2 if needed

> df.merge = merge(df.cast, df[df$X2!=0,c("ID", "X2")], by="ID", all.x =TRUE)
> df.new =df.merge[, c(1,ncol(df.merge), 2:(ncol(df.merge)-1))]
> df.new
   ID X2 X8 X3_1 X3_2 X3_3 X3_4 X4_1 X4_2 X4_3 X4_4 X5_1 X5_2 X5_3 X5_4 X6_1 X6_2 X6_3 X6_4 X7_1 X7_2 X7_3 X7_4
1 202  4  4  895  130  364  104  114  399  112   12   17   74   48   83   21   19   12   81    1    4    5    0
2 203  2  5  895  950  430  868  112  815  156   25   76   32   69   71   49   35   14   20    1    4    5    0
3 204  4  8  999  801  999  205  999  398  999    0  999   51  999   14  999   44  999   24  999    4  999    0
Cabana
  • 419
  • 2
  • 7
1

We could try

 res <- Reduce(function(...) merge(..., by = c("ID", "X8"),
           all=TRUE), split(df1[-(2:3)], df1$X1))
 res[is.na(res)] <- 999
 res$X2 <- df1$X2[df1$X2!=0]
 colnames(res) <-make.unique(colnames(res))
 res[c(1:2, 23, 3:22)]
 #   ID X8 X2 X3.x X4.x X5.x X6.x X7.x X3.y X4.y X5.y X6.y X7.y X3.x.1 X4.x.1 X5.x.1 X6.x.1 X7.x.1 X3.y.1 X4.y.1 X5.y.1 X6.y.1 X7.y.1
 #1 202  4  4  895  114   17   21    1  130  399   74   19    4  364.0    112     48     12      5    104     12     83     81      0
 #2 203  5  2  895  112   76   49    1  950  815   32   35    4    3.4    156     69     14      5    868     25     71     20      0
 #3 204  8  4  999  999  999  999  999  801  398   51   44    4  999.0    999    999    999    999    205      0     14     24      0

Or we can use dcast from data.table which can take multiple value.var columns

library(data.table)
res1 <- dcast(setDT(df1), ID+X8~X1, value.var = paste0("X", 3:7),
                              fill = 999)[, X2 := df1$X2[df1$X2!=0]]
res1   
#    ID X8 X3_1 X3_2  X3_3 X3_4 X4_1 X4_2 X4_3 X4_4 X5_1 X5_2 X5_3 X5_4 X6_1 X6_2 X6_3 X6_4 X7_1 X7_2 X7_3 X7_4 X2
#1: 202  4  895  130 364.0  104  114  399  112   12   17   74   48   83   21   19   12   81    1    4    5    0  4
#2: 203  5  895  950   3.4  868  112  815  156   25   76   32   69   71   49   35   14   20    1    4    5    0  2
#3: 204  8  999  801 999.0  205  999  398  999    0  999   51  999   14  999   44  999   24  999    4  999    0  4
akrun
  • 874,273
  • 37
  • 540
  • 662
0

This can be done using base R reshape() by (1) treating both ID and X8 columns as idvar columns, (2) treating the X1 column as the timevar column, (3) merging on ID with only non-zero X2 rows after the reshaping, and (4) replacing NAs with 999 after the reshaping:

df <- data.frame(ID=c(202L,202L,202L,202L,203L,203L,203L,203L,204L,204L),X1=c(1L,2L,3L,4L,1L,2L,3L,4L,2L,4L),X2=c(0L,0L,0L,4L,0L,2L,0L,0L,0L,4L),X3=c(895,130,364,104,895,950,3.4,868,801,205),X4=c(114L,399L,112L,12L,112L,815L,156L,25L,398L,0L),X5=c(17L,74L,48L,83L,76L,32L,69L,71L,51L,14L),X6=c(21L,19L,12L,81L,49L,35L,14L,20L,44L,24L),X7=c(1L,4L,5L,0L,1L,4L,5L,0L,4L,0L),X8=c(4L,4L,4L,4L,5L,5L,5L,5L,8L,8L));
res <- merge(subset(df[c('ID','X2')],X2!=0L),reshape(df,dir='w',idvar=c('ID','X8'),timevar='X1',drop='X2',sep='_'));
res[is.na(res)] <- 999L;
res;
##    ID X2 X8 X3_1 X4_1 X5_1 X6_1 X7_1 X3_2 X4_2 X5_2 X6_2 X7_2  X3_3 X4_3 X5_3 X6_3 X7_3 X3_4 X4_4 X5_4 X6_4 X7_4
## 1 202  4  4  895  114   17   21    1  130  399   74   19    4 364.0  112   48   12    5  104   12   83   81    0
## 2 203  2  5  895  112   76   49    1  950  815   32   35    4   3.4  156   69   14    5  868   25   71   20    0
## 3 204  4  8  999  999  999  999  999  801  398   51   44    4 999.0  999  999  999  999  205    0   14   24    0
bgoldst
  • 34,190
  • 6
  • 38
  • 64