1

I'm using R Studio with R version 3.2.2 and I want to filter for rows in a data frame, where I want to see only those rows, where the combination of values from two columns don't match to a valid combination stored value pairs in a second data frame. I've heard R is not designed to use loops in a performant way so I'm looking for a solution without using loops, if there are any.

Using this sample data I scaled down, the first data frame dfItem contains item IDs and two statuses.

> dfItem
  ID Status_1 Status_2
1  1        1        1
2  2        1        2
3  3        2        1
4  4        3        3
5  5        2        3
6  6        1        1

Second data frame dfValidPairs contains all valid combinations of the two statuses.

> dfValidStatus
  Status_1 Status_2
1        1        1
2        1        2
3        3        3

How can I check if a status pair from dfItem, consisting of Status_1 and Status_2, is in one of the valid statuses from dfValidPairs? In the end I would like to have only the rows from dfItem which have an invalid status pair, stored in a new data frame like this.

> dfInvalid
  ID Status_1 Status_2
1  3        2        1
2  5        2        3 

I can only think of for loops which iterate over the data frames. Do you know if there are any more performant possibilities?

Cheers, Felix


code for the objects

> dfItem <- data.frame( c(1, 2, 3, 4, 5, 6), c(1, 1, 2, 3, 2, 1), c(1, 2, 1, 3, 3, 1) )
> colnames( dfItem ) <- c( "ID", "Status_1", "Status_2" )
> dfValidStatus <- data.frame( c(1, 1, 3), c(1,2,3) )
> colnames( dfValidStatus ) <- c( "Status_1", "Status_2" )
Phoen
  • 257
  • 3
  • 16

1 Answers1

2

This is a type of merge - or actually here its opposite:

> library(dplyr)
> anti_join(dfItem, dfValidStatus)

Joining by: c("Status_1", "Status_2")
  ID Status_1 Status_2
1  5        2        3
2  3        2        1

The other useful merge or set operations are inner_join, left_join, right_join etc

Stephen Henderson
  • 6,340
  • 3
  • 27
  • 33