-1

I have a dataset containing instances of crime, each of which have a location (wardId), crime type (Primary.Type) and Date. The eventual goal is to turn this into network with Primary.Type's as nodes, and the weights of their edges to depend on the number of times they share a unique combination of Primary.Type's and Ward.Date (two types of crime that happen in the same ward on the day are a connection, of which the number of times this combination happened that day determines the weight).

After merging on Ward.Date, wardId and DateI have an edgelist containing Ward.Date ('wardId yyyy-mm-dd'), Primary.Type.x ('string'), Primary.Type.y ('string'), Weight (numeric). The Primary.Types were derived from merging a duplicated table to create an edgelist, hence there are inversed duplicates.

Ward.Date Primary.Type.x Primary.Type.y Weight
1 2017-01-03 Theft Robbery 1
1 2017-01-03 Robbery Theft 1
2 2017-01-05 Battery Burglary 2
2 2017-01-05 Burglary Battery 2

With some help (Deleting reversed duplicates with R) I managed to remove the reversed duplicate values for Primary.Type.x and Primary.Type.y with identical values Ward.Date with:

      test <- edgelist.primary.wards[!duplicated(lapply(as.data.frame(t(edgelist.primary.wards), 
          stringsAsFactors=FALSE), sort)),]

The caveat, however, is that the resulting table deleted the Weight of the reversed pair as well, while this count is not aggregated into the Weight of the pair that is retained:

Ward.Date Primary.Type.x Primary.Type.y Weight
1 2017-01-03 Theft Robbery 1
2 2017-01-05 Burglary Battery 2

Instead, the target output should be:

Ward.Date Primary.Type.x Primary.Type.y Weight
1 2017-01-03 Theft Robbery 2
2 2017-01-05 Burglary Battery 4

Looking forward to seeing your suggestions!

1 Answers1

0

You can sort the values in columns Primary.Type.x and Primary.Type.y and summarise the data with sum of Weight.

library(dplyr)

edgelist.primary.wards %>%
  group_by(Ward.Date, 
           Type1 = pmin(Primary.Type.x, Primary.Type.y), 
           Type2 = pmax(Primary.Type.x, Primary.Type.y)) %>%
  summarise(Weight = sum(Weight, na.rm = TRUE)) -> result

result

This can also be written in base R as :

aggregate(Weight~Ward.Date + Type1 + Type2, transform(edgelist.primary.wards, 
          Type1 = pmin(Primary.Type.x, Primary.Type.y), 
          Type2 = pmax(Primary.Type.x, Primary.Type.y)), sum, na.rm = TRUE)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213