0

I am trying to exclude some rows from a data.frame based on the matching criteria from a second data frame.

Let's supposed that the data frame that I want to exclude looks like this:

device.id <- c("F-23", "F-22", "F-20", "F-23", "F-23", "F-23")
test.id <- c(215,216,217,215,218,219)
patient.id <- c("66MKT030_2", "66MGT1220_2", "66MGT063_1", "66MKT030_2", "66MKT350_2", "66MGT063_1")
data.test <- as.data.frame( cbind(device.id, test.id, patient.id))

I want to exclude the rows from the data.test based on a second dataframe which includes the exclusion criterias defined using 3 variables (device.id, test.id and patient.id):

DEVICEID <- c("F-23", "F-22")
TESTID <- c(215,218)
PATIT <- c("66MKT030_2",  "66MKT350_2")
data.excl <- as.data.frame( cbind(DEVICEID, TESTID, PATIT))

the aimed output should be something like this:

> data.test 
device.id test.id  patient.id
2      F-22     216 66MGT1220_2
3      F-20     217  66MGT063_1
5      F-23     218  66MKT350_2
6      F-23     219  66MGT063_1

where rows 1 and 4 from data.test were excluded because they match the device.id, test.id and patient.id included in row 1 of data.excl.

I have tried to do the following:

s <- data.test[ !which((data.test$test.id %in% data.excl$DEVICEID &
                           data.test$patient.id %in% data.excl$TESTID &
                           data.test$data.test %in% data.excl$PATIT)), ]

but it removes all rows of the data.test.

I could use a for loop but my real data set is >4000 rows and the data frame with the exclusion criteria is >500 rows.

Is there any cleaver way of doing this?

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
ACLAN
  • 401
  • 3
  • 9
  • 1
    If you give them the same column names, it will be `dplyr::anti_join(data.test, data.excl)`. With the column names you show, `dplyr::anti_join(data.test, data.excl, by = c("device.id" = "DEVICEID", "test.id" = "TESTID", "patient.id" = "PATIT"))` – Gregor Thomas Dec 16 '19 at 14:08
  • One other comment: don't use `cbind` to build data frames. `cbind` coerces things to a matrix, which means coercing every column to `character`, and then when you use `as.data.frame`, every column becomes `factor` class. It's more efficient, less buggy, and less typing to use `data.test = data.frame(device.id, test.id, patient.id)` compared to `data.test <- as.data.frame( cbind(device.id, test.id, patient.id))` – Gregor Thomas Dec 16 '19 at 14:10
  • @Gregor thank you. Yes I did that and found the solution here: https://stackoverflow.com/questions/28702960/find-complement-of-a-data-frame-anti-join – ACLAN Dec 16 '19 at 14:18

0 Answers0