I have a dataset containing instances of crime, each of which have a location (wardId), crime type (Primary.Type) and Date. The eventual goal is to turn this into network with Primary.Type's as nodes, and the weights of their edges to depend on the number of times they share a unique combination of Primary.Type's and Ward.Date (two types of crime that happen in the same ward on the day are a connection, of which the number of times this combination happened that day determines the weight).
After merging on Ward.Date, wardId and DateI have an edgelist containing Ward.Date ('wardId yyyy-mm-dd'), Primary.Type.x ('string'), Primary.Type.y ('string'), Weight (numeric). The Primary.Types were derived from merging a duplicated table to create an edgelist, hence there are inversed duplicates.
Ward.Date | Primary.Type.x | Primary.Type.y | Weight |
---|---|---|---|
1 2017-01-03 | Theft | Robbery | 1 |
1 2017-01-03 | Robbery | Theft | 1 |
2 2017-01-05 | Battery | Burglary | 2 |
2 2017-01-05 | Burglary | Battery | 2 |
With some help (Deleting reversed duplicates with R) I managed to remove the reversed duplicate values for Primary.Type.x and Primary.Type.y with identical values Ward.Date with:
test <- edgelist.primary.wards[!duplicated(lapply(as.data.frame(t(edgelist.primary.wards),
stringsAsFactors=FALSE), sort)),]
The caveat, however, is that the resulting table deleted the Weight of the reversed pair as well, while this count is not aggregated into the Weight of the pair that is retained:
Ward.Date | Primary.Type.x | Primary.Type.y | Weight |
---|---|---|---|
1 2017-01-03 | Theft | Robbery | 1 |
2 2017-01-05 | Burglary | Battery | 2 |
Instead, the target output should be:
Ward.Date | Primary.Type.x | Primary.Type.y | Weight |
---|---|---|---|
1 2017-01-03 | Theft | Robbery | 2 |
2 2017-01-05 | Burglary | Battery | 4 |
Looking forward to seeing your suggestions!