Remove Duplicates Based on Combined Sets

Question

I have a data frame that looks like this:

C <- data.frame(A_Latitude  = c(48.4459, 48.7     , 49.0275, 49.0275,   49.0275, 49.0275,   48.4459),
            A_Longitude = c(9.989    , 8.15   , 8.7539 , 8.7539 ,   8.7539 , 8.7539 , 9.989  ),
            B_Latitude  = c(49.0275, 48.4734,   48.4459, 48.9602,   48.9602, 48.4459,   49.0275),
            B_Longitude = c(8.7539 , 9.227  ,   9.989    , 9.2058 , 9.2058 , 9.989  , 8.7539 ))

The data frame consist of latitude/longitude coordinates for a set of two locations (A + B; i.e., A_Latitude/A_Longitude, B_Latitude/B_Longitude).

I would like to remove duplicates based on combined sets (i.e., remove row entries where Location A/Location B is equivalent to Location B /Location A; i.e., rows with A_Latitude / A_Longitude / B_Latitude / B_Longitude = B_Latitude / B_Longitude / A_Latitude / A_Longitude.

The answers [Finding unique combinations irrespective of position [duplicate]] and [Removing duplicate combinations (irrespective of order)] are not helpful because those solutions do not account for combined sets of columns (which are relevant here when considering locations around the globe (e.g., latitude/longitude coordinates are equivalent for one location)).

Thank you in advance for your help.

CPak · Accepted Answer · 2018-05-16T17:50:34.723

One idea is to treat each long/lat pair as a string toString(...) - sort the two long/lat pairs (now strings) per row - then sort the resulting 2-element string vector. Use the sorted vector of strings to check for duplicates

ans <- C[!duplicated(lapply(1:nrow(C), function(i) sort(c(toString(C[i,1:2]), toString(C[i,3:4]))))), ]
  # A_Latitude A_Longitude B_Latitude B_Longitude
# 1    48.4459      9.9890    49.0275      8.7539
# 2    48.7000      8.1500    48.4734      9.2270
# 4    49.0275      8.7539    48.9602      9.2058

Here's a breakdown for row 1

toString(C[1,1:2])
# [1] "48.4459, 9.989"
toString(C[1,3:4])
# [1] "49.0275, 8.7539"
sort(c(toString(C[1,1:2]), toString(C[1,3:4])))
# [1] "48.4459, 9.989"  "49.0275, 8.7539"

Remove Duplicates Based on Combined Sets

1 Answers1