I'm using R Studio with R version 3.2.2 and I want to filter for rows in a data frame, where I want to see only those rows, where the combination of values from two columns don't match to a valid combination stored value pairs in a second data frame. I've heard R is not designed to use loops in a performant way so I'm looking for a solution without using loops, if there are any.
Using this sample data I scaled down, the first data frame dfItem contains item IDs and two statuses.
> dfItem
ID Status_1 Status_2
1 1 1 1
2 2 1 2
3 3 2 1
4 4 3 3
5 5 2 3
6 6 1 1
Second data frame dfValidPairs contains all valid combinations of the two statuses.
> dfValidStatus
Status_1 Status_2
1 1 1
2 1 2
3 3 3
How can I check if a status pair from dfItem, consisting of Status_1 and Status_2, is in one of the valid statuses from dfValidPairs? In the end I would like to have only the rows from dfItem which have an invalid status pair, stored in a new data frame like this.
> dfInvalid
ID Status_1 Status_2
1 3 2 1
2 5 2 3
I can only think of for loops which iterate over the data frames. Do you know if there are any more performant possibilities?
Cheers, Felix
code for the objects
> dfItem <- data.frame( c(1, 2, 3, 4, 5, 6), c(1, 1, 2, 3, 2, 1), c(1, 2, 1, 3, 3, 1) )
> colnames( dfItem ) <- c( "ID", "Status_1", "Status_2" )
> dfValidStatus <- data.frame( c(1, 1, 3), c(1,2,3) )
> colnames( dfValidStatus ) <- c( "Status_1", "Status_2" )