Let's assume I have the following lookup table:
(lkp <- structure(list(a = c("a", "a", "a", "b", "c"),
b = c("a1 a2", "a3 a2", "a3", "a1", "a1")),
row.names = c("lkp_1", "lkp_2", "lkp_3", "lkp_4", "lkp_5"),
class = "data.frame"))
# a b
# lkp_1 a a1 a2
# lkp_2 a a3 a2
# lkp_3 a a3
# lkp_4 b a1
# lkp_5 c a1
I want to check if another data.frame
, x
, say, is a subset of lkp
, with the important additional requirement, that for column b
matching means that lkp$b
need only to contain x$b
.
The following example should make clear what I mean:
(chk <- list(c1 = structure(list(a = c("a", "a"), b = c("a2", "a2")), row.names = c(NA, -2L), class = "data.frame"),
c2 = structure(list(a = "b", b = "a1"), row.names = c(NA, -1L), class = "data.frame"),
c3 = structure(list(a = c("a", "a"), b = c("a1", "a1")), row.names = c(NA, -2L), class = "data.frame"),
c4 = structure(list(a = c("a", "a"), b = c("a3", "a2")), row.names = c(NA, -2L), class = "data.frame")))
# $c1
# a b
# 1 a a2
# 2 a a2
# $c2
# a b
# 1 b a1
# $c3
# a b
# 1 a a1
# 2 a a1
# $c4
# a b
# 1 a a3
# 2 a a2
chk$c1
: row 1 matches rowlkp_1
(andlkp_2
) as columna
is the same andlkp$b
containsa2
chk$c2
andchk$c4
match as wellchk$c3
does NOT match. While each row matcheslkp_1
,c4
is not a subset aslkp
would need to contain 2 different rows which match.
In principle I am looking for a merge (or join) where the join condition would use some sort of fuzzy matching.
I have found and read these two SO answers:
- How to check if a row is a subset of a data.frame?
- R merge data frames, allow inexact ID matching (e.g. with additional characters 1234 matches ab1234 )
And especially the second answer looks promising. However, I do not need approximate matching but rather some sort of does_contain
relationship instead of pure equality. So maybe a regex
solution would work?
Expected Outcome
magic_is_subset_function <- function(chk, lkp) {
# ...
}
sapply(chk, magic_is_subset_function, lkp = lkp)
# [1] TRUE TRUE FALSE TRUE