1

I have the following dataset:

    ID      Date    Op  SS  type    
1   AR02    03-14   NA  SS  ET
2   AR02    03-14   NA  SS  ET  
3   AR02    06-14   NA  SS  ET  
4   AR03    31-14   NA  SS  ET  
5   AR08    03-14   NA  SS  ET  
6   AR09    03-14   NA  SS  EN  

Since I have duplicate rows, I would like to get a table that appends all the duplicate rows as separate columns. Something like this.

    ID      Date    Op  SS  type ID     Date    Op  SS  type  ID    Date    Op  SS  type
1   AR02    03-14   NA  SS  ET   AR02   03-14   NA  SS  ET    AR02  06-14   NA  SS  ET  
2   AR03    31-14   NA  SS  ET   NA     NA      NA  NA  NA    NA    NA      NA  NA  NA  
3   AR08    03-14   NA  SS  ET   NA     NA      NA  NA  NA    NA    NA      NA  NA  NA  
4   AR09    03-14   NA  SS  EN   NA     NA      NA  NA  NA    NA    NA      NA  NA  NA

I am trying to do this in R. I tried melt, transpose (t), aggregate functions but somehow unable to get the right fix. Any help will be appreciated.

Hash
  • 57
  • 8
  • Could you elaborate on the desired transformation? If the *AR02* has 6 repeated dates would you like to create 6 columns: date_*n*, each for each repetition? Did you try `reshape(my.df,direction="wide", idvar = "ID", timevar = "Date")`? – Konrad Nov 04 '15 at 14:00
  • I'm curious *why* you want your data in this wide format. – davechilders Nov 04 '15 at 14:08
  • Thanks Konrad. Yes that was exactly what I was trying to get. Your suggestion was good but in this case I still found one of the duplicated rows (AR02) not appended to the rest. – Hash Nov 04 '15 at 14:11
  • @DMC: Just the way my experimental data is designed. This is a small example of the dataset I am working with. It includes 400+ samples and 33 columns. Just makes it convenient to look at all the samples in one row for further selection. – Hash Nov 04 '15 at 14:14
  • 1
    Yes Akrun. Thanks it was a great help. Have been breaking my head on this since this morning. – Hash Nov 04 '15 at 14:23

1 Answers1

3

One option would be to convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'ID', we paste the rows together, then use cSplit to split the pasted column ('V1') to separate columns. Note that I am not repeating the 'ID' column as it is already unique for a row.

 library(splitstackshape)
 library(data.table)
 DT <- setDT(df1)[, do.call(paste, c(.SD, list(collapse=', '))) , ID]
 DT1 <- cSplit(DT, 'V1', sep='[ ,]+', fixed=FALSE, stripWhite=TRUE)
 setnames(DT1, 2:ncol(DT1), rep(names(df1)[-1], 3))
 DT1
 #     ID  Date Op SS type  Date Op SS type  Date Op SS type
 #1: AR02 03-14 NA SS   ET 03-14 NA SS   ET 06-14 NA SS   ET
 #2: AR03 31-14 NA SS   ET    NA NA NA   NA    NA NA NA   NA
 #3: AR08 03-14 NA SS   ET    NA NA NA   NA    NA NA NA   NA
 #4: AR09 03-14 NA SS   EN    NA NA NA   NA    NA NA NA   NA

It is not recommended to have duplicated column names, so

 setnames(DT1,  make.unique(names(DT1)))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks a ton Akrun! Your fix was perfect! Much appreciated. – Hash Nov 04 '15 at 14:21
  • Hi Akrun, I have one question regarding your fix. Though the fix works perfectly, is there a way to retain the original headers for corresponding columns? – Hash Nov 04 '15 at 14:33
  • @Hash Updated the post, hope it helps. – akrun Nov 04 '15 at 16:30
  • Thanks Akrun. > setnames(DT1, 2:ncol(DT1), rep(names(df1)[-1], 3)) I am still unable to get the original headers. I get V1_001, V1_002 and so on. Could you suggest a quick fix for this please? Also, I would like to retain the Id columns. Can I just get rid of [-1] from rep(names(df1)[-1] to achieve that? – Hash Nov 04 '15 at 21:08
  • @Hash I am surprised about that as I couldn't reproduce the problem.. I am using `data.table_1.9.6` Which version of data.table you are using? The 'ID' column is retained. Here I am just renaming all the other columns except the ID using `setnames` – akrun Nov 05 '15 at 04:10
  • 1
    Akrun, it was my bad. I am sorry. I was using the code as it is on a different dataset. Found the mistake I was making. Thanks anyway for all your help. – Hash Nov 05 '15 at 09:25
  • @Hash I am at workplace. The link is blocked as well as email accounts. – akrun Nov 05 '15 at 10:07
  • No problem Akrun. When it is convenient please have a look at it if possible. Thanks – Hash Nov 05 '15 at 10:58