1

I have some data, where first column have some duplicating rows, thought another column is from all different data. I need to leave just one duplicating row in first column and merge rows with different ones from another column. For example

Z = c( "a", "a", "b", "c", "d", "d", "d")
X = c( 10, 10, 0, 3, 4, 4, 4)
Y = c("ab", "bc", "dv", "mh", "op", "va", "po")
c = data.frame(Z,X,Y)

c

  Z  X  Y
1 a 10 ab
2 a 10 bc
3 b  0 dv
4 c  3 mh
5 d  4 op
6 d  4 va
7 d  4 po

I need to merge

Z  X   Y
a 10  ab,bc
b  0  dv
c  3  mh
d  4  op, va, po

or even

Z  X   Y    L   V
a  10  ab  bc
b   0  dv
c   3  mh
d   4  op  va  po

Is it possible?

talat
  • 68,970
  • 21
  • 126
  • 157

3 Answers3

2

We can try with data.table

library(data.table)
setDT(c)[, .(X = unique(X), Y = paste(Y, collapse = ",")), by = Z]
#  Z  X        Y
#1: a 10    ab,bc
#2: b  0       dv
#3: c  3       mh
#4: d  4 op,va,po
mtoto
  • 23,919
  • 4
  • 58
  • 71
1

The plyr package is handy in these situations:

library(plyr)
ddply(c, c("Z", "X"), summarise, Y=  paste(Y, collapse = ","))
  Z  X        Y
1 a 10    ab,bc
2 b  0       dv
3 c  3       mh
4 d  4 op,va,po
csgillespie
  • 59,189
  • 14
  • 150
  • 185
1

In base R:

aggregate(Y ~ Z + X, data = c, toString)

which gives:

  Z  X          Y
1 b  0         dv
2 c  3         mh
3 d  4 op, va, po
4 a 10     ab, bc

Or with dplyr:

library(dplyr)
c %>% group_by(Z,X) %>% summarise(Y = toString(Y))

which gives the same result.

Jaap
  • 81,064
  • 34
  • 182
  • 193