0

I have two sets of co-ordinates:

  • Set A one with 49,898 combinations of x and y
  • Set B with 36,404 combinations of x and y.
  • (Set A has all of the combinations in Set B plus an additional 13,494 combinations)
  • solutions in either Excel or R are fine

I want to extract this unique set of 13,494 combinations. For the purpose of trying to extract these unique values using Excel or R, I have copied Set B’s x & y co-ordinate combinations into the same column as Set A’s.

Table layout is two columns Xcod, Ycod : x and y example

I have read through a number of posts proposing the use of excel & R which partly deal with this problem except the output is always 49,898 combinations because they maintain the ‘original’ set of duplicate values. I understand why this is, but what I would like is to delete those duplicates entirely so that I have a final output containing Set A's unique 13,494 combinations.

[Excel] I used the following: Data -> Advanced Filter -> Unique records only

[R] I used the following code from this thread:

UniqRemDups <- unique(RemDups[,c('Xcod','Ycod')])

How to filter for unique combination of columns from an R dataframe

Any help/advice would be greatly appreciated.

smci
  • 32,567
  • 20
  • 113
  • 146
  • 2
    Can you provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – Samuel Mar 14 '17 at 19:11
  • In this specific case you seem to have positive integer coordinates in a range, and all multiples of 1000 (in fact 5000). So you could squeeze each coord-pair into one 32b integer representation e.g. `(Xcod/1000)*1e4 + (Ycod/1000)` e.g. `4405*1e4 + 4725 = 44054725` – smci May 20 '18 at 08:37
  • You don't want unique combinations between the two sets, you simply want Set A \ Set B, i.e. the set 'less' operation. And those aren't combinations, just plain rows of your dataset. – smci May 20 '18 at 08:40

1 Answers1

1

One way in R using the uniquecombs function from mgcv.

data <- structure(list(Xcod = c(4405000L, 4415000L, 4425000L, 4435000L, 
4445000L, 4455000L, 4465000L, 4475000L, 4435000L, 4495000L, 4505000L, 
4515000L, 4525000L, 4535000L, 4545000L, 4555000L, 4565000L, 4575000L, 
4585000L), Ycod = c(4725000L, 4725000L, 4725000L, 4725000L, 4725000L, 
4725000L, 4725000L, 4725000L, 4725000L, 4725000L, 4725000L, 4725000L, 
4725000L, 4725000L, 4725000L, 4725000L, 4725000L, 4725000L, 4725000L
)), .Names = c("Xcod", "Ycod"), class = "data.frame", row.names = c(NA, 
-19L))

library(mgcv)
unique_rows <- uniquecombs(data)
user25494
  • 1,289
  • 14
  • 27