2

I'd like to know how to consolidate duplicate rows in a data frame and then combine the duplicated values in another column.

Here's a sample of the existing dataframe and two dataframes that would be acceptable as a solution

df1 <- data.frame(col1 = c("test1", "test2", "test2", "test3"), col2 = c(1, 2, 3, 4))
df.ideal <- data.frame(col1 = c("test1", "test2", "test3"), col2 = c(1, "2, 3", 4))
df.ideal2 <- data.frame(col1 = c("test1", "test2", "test3"), 
                        col2 = c(1, 2, 4), 
                        col3 = c(NA, 3, NA))

In the first ideal dataframe, the duplicated row is collapsed and the column is added with both numbers. I've looked at other similar questions on stack overflow, but they all dealt with combining rows. I need to delete the duplicate row because I have another dataset I'm merging it with that needs the a certain number of rows. So, I want to preserve all of the values. Thanks for your help!

tom
  • 977
  • 3
  • 14
  • 30

2 Answers2

6

To go from df1 to df.ideal, you can use aggregate().

aggregate(col2~col1, df1, paste, collapse=",")
#    col1 col2
# 1 test1    1
# 2 test2  2,3
# 3 test3    4

If you want to get to df.ideal2, that's more of a reshaping from long to wide process. You can do

reshape(transform(df1, time=ave(col2, col1, FUN=seq_along)), idvar="col1", direction="wide")
#    col1 col2.1 col2.2
# 1 test1      1     NA
# 2 test2      2      3
# 4 test3      4     NA

using just the base reshape() function.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
2

Another option would be to use splitstackshape

library(data.table)
library(splitstackshape)
DT1 <- setDT(df1)[,list(col2=toString(col2)) ,col1]
DT1
#    col1 col2
#1: test1    1
#2: test2 2, 3
#3: test3    4

You could split the col2 in DT1 to get the df.ideal2 or

cSplit(DT1, 'col2', sep=',')
#   col1 col2_1 col2_2
#1: test1      1     NA
#2: test2      2      3
#3: test3      4     NA

or from df1

 dcast.data.table(getanID(df1, 'col1'), col1~.id, value.var='col2')
 #   col1 1  2
 #1: test1 1 NA
 #2: test2 2  3
 #3: test3 4 NA
akrun
  • 874,273
  • 37
  • 540
  • 662