Merge Rows within Data Frame

Question

I have a relational dataset, where I'm looking for dyadic information.

I have 4 columns. Sender, Receiver, Attribute, Edge

I'm looking to take the repeated Sender -- Receiver counts and convert them as additional edges.

df <- data.frame(sender = c(1,1,1,1,3,5), receiver = c(1,2,2,2,4,5), 
                attribute = c(12,12,12,12,13,13), edge = c(0,1,1,1,1,0))

   sender receiver attribute edge
1       1        1        12    0
2       1        2        12    1
3       1        2        12    1
4       1        2        12    1
5       3        4        13    1

I want the end result to look like this:

  sender receiver attribute edge
1      1        1        12    0
2      1        2        12    3
3      3        4        13    1

Where the relationship between duplicate sender-receivers have been combined and the number of duplicates incorporated in the number of edges.

Any input would be really appreciated.

Thanks!

score 20 · Answer 1 · edited May 23 '17 at 12:26

For fun, here are two other options, first using the base function aggregate() and the second using data.table package:

> aggregate(edge ~ sender + receiver + attribute, FUN = "sum", data = df)
  sender receiver attribute edge
1      1        1        12    0
2      1        2        12    3
3      3        4        13    1
4      5        5        13    0
> require(data.table)
> dt <- data.table(df)
> dt[, list(sumedge = sum(edge)), by = "sender, receiver, attribute"]
     sender receiver attribute sumedge
[1,]      1        1        12       0
[2,]      1        2        12       3
[3,]      3        4        13       1
[4,]      5        5        13       0

For the record, this question has been asked many many many times, perusing my own answers yields several answers that would point you down the right path.

Any answer using only base functions always gets +1 from me. — CCC, Jun 18 '12 at 16:09

mnel · Accepted Answer · 2012-05-24T03:11:45.010

7

plyr is your friend - although I think your end result is not quite correct given the input data.

library(plyr)

ddply(df, .(sender, receiver, attribute), summarize, edge = sum(edge))

Returns

  sender receiver attribute edge
1      1        1        12    0
2      1        2        12    3
3      3        4        13    1
4      5        5        13    0

edited May 24 '12 at 03:11

answered May 24 '12 at 02:35

mnel

113,303
27
265
254

I think the OP was not intending to group by `sender + receiver + attribute`, but just by `sender + receiver`, and `attribute` goes along for the ride. In the example, `attribute` just happens to be unique for the `sender + receiver` pairings, but I *think* that was accidental – Mark Lakata Nov 30 '16 at 21:32

Merge Rows within Data Frame

2 Answers2