I am trying to check if two variables have a one to one relation. One of the two variables contains characters of address, while the other contain a ID to the address. I would like to see if it's a one to one correspondence. I was thinking about converting characters to ASCII code or assign them a value using a math function. But I want to know if there's other easier and more effective ways to do so.
Asked
Active
Viewed 386 times
1
-
Can you provide a *minimal* sample of your data? It is unclear what you mean at the moment. See [creating a great reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – thelatemail Oct 02 '13 at 02:29
-
Hello and welcome to SO. To help make a reproducible example, you can use `reproduce(
)` . Instructions are here: http://bit.ly/SORepro – Ricardo Saporta Oct 02 '13 at 02:33
1 Answers
4
You can use table
, and check if the resulting matrix has exactly one 1
in each row and in each column. That also tells you where the duplicates are.
d <- data.frame(
x = sample( LETTERS, 10, replace=TRUE ),
y = sample( LETTERS, 10, replace=TRUE )
)
m <- table(d) != 0
all( rowSums( m ) == 1 ) && all( colSums( m ) == 1 )
But if there is a lot of data, this is not very efficient. You can use a sparse matrix instead.
library(Matrix)
m <- sparseMatrix(
i = as.numeric( as.factor( d$x ) ),
j = as.numeric( as.factor( d$y ) ),
x = rep( 1, nrow(d) )
)
m <- m > 0
all( rowSums( m ) == 1 ) && all( colSums( m ) == 1 )
You can also use sqldf
.
library(sqldf)
sqldf( "SELECT x, COUNT( DISTINCT y ) AS n FROM d GROUP BY x HAVING n > 1" )
sqldf( "SELECT y, COUNT( DISTINCT x ) AS n FROM d GROUP BY y HAVING n > 1" )
You can also simply count how many different pairs you have:
it should be the same as the number of distinct values of x
and of y
.
nrow( unique(d) ) == length(unique(d$x)) && nrow( unique(d) ) == length(unique(d$y))

Vincent Zoonekynd
- 31,893
- 5
- 69
- 78