Summing column values for (and only for) deleted reversed duplicates with their undeleted reversed originals with R

Question

I have a dataset containing instances of crime, each of which have a location (wardId), crime type (Primary.Type) and Date. The eventual goal is to turn this into network with Primary.Type's as nodes, and the weights of their edges to depend on the number of times they share a unique combination of Primary.Type's and Ward.Date (two types of crime that happen in the same ward on the day are a connection, of which the number of times this combination happened that day determines the weight).

After merging on Ward.Date, wardId and DateI have an edgelist containing Ward.Date ('wardId yyyy-mm-dd'), Primary.Type.x ('string'), Primary.Type.y ('string'), Weight (numeric). The Primary.Types were derived from merging a duplicated table to create an edgelist, hence there are inversed duplicates.

Ward.Date	Primary.Type.x	Primary.Type.y	Weight
1 2017-01-03	Theft	Robbery	1
1 2017-01-03	Robbery	Theft	1
2 2017-01-05	Battery	Burglary	2
2 2017-01-05	Burglary	Battery	2

With some help (Deleting reversed duplicates with R) I managed to remove the reversed duplicate values for Primary.Type.x and Primary.Type.y with identical values Ward.Date with:

      test <- edgelist.primary.wards[!duplicated(lapply(as.data.frame(t(edgelist.primary.wards), 
          stringsAsFactors=FALSE), sort)),]

The caveat, however, is that the resulting table deleted the Weight of the reversed pair as well, while this count is not aggregated into the Weight of the pair that is retained:

Ward.Date	Primary.Type.x	Primary.Type.y	Weight
1 2017-01-03	Theft	Robbery	1
2 2017-01-05	Burglary	Battery	2

Instead, the target output should be:

Ward.Date	Primary.Type.x	Primary.Type.y	Weight
1 2017-01-03	Theft	Robbery	2
2 2017-01-05	Burglary	Battery	4

Looking forward to seeing your suggestions!

score 0 · Answer 1 · answered Mar 09 '21 at 02:26

You can sort the values in columns Primary.Type.x and Primary.Type.y and summarise the data with sum of Weight.

library(dplyr)

edgelist.primary.wards %>%
  group_by(Ward.Date, 
           Type1 = pmin(Primary.Type.x, Primary.Type.y), 
           Type2 = pmax(Primary.Type.x, Primary.Type.y)) %>%
  summarise(Weight = sum(Weight, na.rm = TRUE)) -> result

result

This can also be written in base R as :

aggregate(Weight~Ward.Date + Type1 + Type2, transform(edgelist.primary.wards, 
          Type1 = pmin(Primary.Type.x, Primary.Type.y), 
          Type2 = pmax(Primary.Type.x, Primary.Type.y)), sum, na.rm = TRUE)

Summing column values for (and only for) deleted reversed duplicates with their undeleted reversed originals with R

1 Answers1