2

I would like to know how to use R to merge rows in one set of data.

Currently my data looks like this:

Text 1     Text 2        Text 3       Text 4
Bob        Aba           Abb          Abc
Robert     Aba           Abb          Abc
Fred       Abd           Abe          Abf
Martin     Abg           Abh          Abi

If text two and text 3 are both the same for two rows (as in rows 1 & 2) I would like to make it into one row with more columns for the other data.

Text 1      Text 1a   Text 2         Text 3       Text 4     Text 4a
Bob         Robert     Aba           Abb          Abc        Abd
Fred        NA         Abd           Abe          Abf        NA
Martin      NA         Abg           Abh          Abi        NA

I did something similar with joining two separate sets of data and merging them using join

join=join(Data1, Data2, by = c('Text2'), type = "full", match = "all")

but I can't work out how to do it for duplicates within one set of data.

I think it might be possible to use aggregate but I have not used it before, my attempt was:

MyDataAgg=aggregate(MyData, by=list(MyData$Text1), c)

but when I try I am getting an output that looks like this on summary:

 1    -none-   numeric                              
 1    -none-   numeric                              
 2    -none-   numeric  

or this on structure:

 $ Initials               :List of 12505
  ..$ 1    : int 62
  ..$ 2    : int 310
  ..$ 3    : int 504

I would also like to be able to combine rows using matching elements of two variables.

Wol44
  • 21
  • 2
  • Can you please provide a few more details? How many columns would you have to duplicate, only `Text1` and `Text4`? Do you expect there to be more than 2 copies of each ID pair (is the number of potential columns only twice as large)? – ilir Oct 10 '14 at 10:03
  • In almost all cases there would only be two rows the same - so yes twice the number of columns. I'm not worried about losing the odd bit of data in the unlikely case there are 3 rows the same. But it would have to not break if this happened - so I couldn't do it by sorting the data and relying on every other record being consistent. There are also a lot more variables than in my example. – Wol44 Oct 13 '14 at 16:10

1 Answers1

1

I don't think you can reshape or aggregate because :

  1. You have duplicated rows that corresponds to the same key
  2. You don't have the same number of value for each keys : you should fill it with missing values

Here a a manual attempt using by to process by key, and rbind.fill to aggregate all the list together. Each by step , is creating a one-row data.frame having (Text2,Text3) as key.

do.call(plyr::rbind.fill,by(dat,list(dat$Text2,dat$Text3),
   function(d){
     ## change all other columns to a one row data.frame
     dd <- as.data.frame(as.list(rapply(d[,-c(2,3)],as.character)))
     ## the tricky part : add 1 to a name like Text1 to become Text11 , 
     ## this is import to join data.frames formed by by
     names(dd) <- gsub('(Text[0-9]$)','\\11',names(dd))
     ## add key to to the row
     cbind(unique(d[,2:3]),dd)
   }))

 Text2 Text3 Text11 Text12 Text41 Text42
1   Aba   Abb    Bob Robert    Abc    Abd
2   Abd   Abe   Fred   <NA>    Abf   <NA>
3   Abg   Abh Martin   <NA>    Abi   <NA>
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • Thanks for this. However, I still can't get it to work. Other than changing dat and text are there any other parts that I would need to modify? – Wol44 Oct 13 '14 at 16:00
  • @Wol44 what do you get as error? Please add a `dput(head(data_frame))` to the question. – agstudy Oct 13 '14 at 16:03