1

I am trying to check if two variables have a one to one relation. One of the two variables contains characters of address, while the other contain a ID to the address. I would like to see if it's a one to one correspondence. I was thinking about converting characters to ASCII code or assign them a value using a math function. But I want to know if there's other easier and more effective ways to do so.

thelatemail
  • 91,185
  • 12
  • 128
  • 188
Fang Li
  • 59
  • 1
  • 1
  • 5
  • Can you provide a *minimal* sample of your data? It is unclear what you mean at the moment. See [creating a great reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – thelatemail Oct 02 '13 at 02:29
  • Hello and welcome to SO. To help make a reproducible example, you can use `reproduce()` . Instructions are here: http://bit.ly/SORepro – Ricardo Saporta Oct 02 '13 at 02:33

1 Answers1

4

You can use table, and check if the resulting matrix has exactly one 1 in each row and in each column. That also tells you where the duplicates are.

d <- data.frame( 
  x = sample( LETTERS, 10, replace=TRUE ),
  y = sample( LETTERS, 10, replace=TRUE )
)
m <- table(d) != 0
all( rowSums( m ) == 1 ) && all( colSums( m ) == 1 )

But if there is a lot of data, this is not very efficient. You can use a sparse matrix instead.

library(Matrix)
m <- sparseMatrix(
  i = as.numeric( as.factor( d$x ) ),
  j = as.numeric( as.factor( d$y ) ),
  x = rep( 1, nrow(d) )
)
m <- m > 0
all( rowSums( m ) == 1 ) && all( colSums( m ) == 1 )

You can also use sqldf.

library(sqldf)
sqldf( "SELECT x, COUNT( DISTINCT y ) AS n FROM d GROUP BY x HAVING n > 1" )
sqldf( "SELECT y, COUNT( DISTINCT x ) AS n FROM d GROUP BY y HAVING n > 1" )

You can also simply count how many different pairs you have: it should be the same as the number of distinct values of x and of y.

nrow( unique(d) ) == length(unique(d$x)) && nrow( unique(d) ) == length(unique(d$y))
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78