1

I'm curious to know if it is possible to do partial string matches using the %in% operator in R. I know that there are many ways to use stringr, etc. to find partial string matches, but my current code works easier using the %in% operator.

For instance, imagine this vector:

x <- c("Withdrawn", "withdrawn", "5-Withdrawn", "2-WITHDRAWN", "withdrawnn")

I want each of these to be TRUE because the string contains "Withdrawn", but only the first is TRUE:

x %in% c("Withdrawn")
[1]  TRUE FALSE FALSE FALSE FALSE

I tried using regex to at least make it case insensitive, but that made everything false:

x %in% c("(?i)Withdrawn")
[1] FALSE FALSE FALSE FALSE FALSE

So, is it possible to yield TRUE on all of these using the %in% operator with maybe a wrapper? Because it's easy to use tolower() or toupper(), I'm not as concerned with the case sensitivity; however, it is important to me that the code would trigger "withdrawn", "withdrawnn", and "5-withdrawn".

EDIT: This question was marked as a duplicate of this question Case-insensitive search of a list in R; however, it is different because it is asking if partial string matches are possible using the %in% operator. The linked question does not use the %in% operator at all.

J.Sabree
  • 2,280
  • 19
  • 48
  • 1
    Use `grep`/`grepl` with the regex, see [Case-insensitive search of a list in R](https://stackoverflow.com/questions/5671719/case-insensitive-search-of-a-list-in-r) – Wiktor Stribiżew Jun 18 '19 at 13:13
  • @WiktorStribiżew, I know that there are other ways to match strings, but this question is trying to see how it could be done using the %in% operator. The link that you sent does not discuss using the %in% operator. – J.Sabree Jun 18 '19 at 13:24
  • But https://stackoverflow.com/questions/40174604/r-in-operator-control-case-sensitivity?noredirect=1&lq=1 is closed with that exact question. So, it is a valid dupe. – Wiktor Stribiżew Jun 18 '19 at 13:29
  • @WiktorStribiżew, that one only asks for case sensitivity. Mine is asking primarily for partial string match using the %in% operator. In my request, I even mentioned that the case sensitivity was a minor request, and I can remove it if that will help make it not seem like a duplicate. – J.Sabree Jun 18 '19 at 13:33
  • Added [partial string matching - R](https://stackoverflow.com/questions/23901500/partial-string-matching-r) to the dupe links. Still, use grep/grepl. – Wiktor Stribiżew Jun 18 '19 at 13:34
  • 1
    @WiktorStribiżew I disagree. It’s sufficiently different that it doesn’t obviously apply without explanatory comment. – Konrad Rudolph Jun 18 '19 at 13:58

1 Answers1

7

%in% does not support this: It’s a wrapper for the match function, which uses equality comparison to establish matches, not regular expression matching. However, you can implement your own:

`%rin%` = function (pattern, list) {
     vapply(pattern, function (p) any(grepl(p, list)), logical(1L), USE.NAMES = FALSE)
}

And this can be used like %in%:

〉'^foo.*' %rin% c('foo', 'foobar')
[1] TRUE

Note that the result differs from your requirement to work as you’d expect from grepl: pattern matching is asymmetric, you can’t swap the left and right-hand side. If you just want to match a list against a single regular expression, use grepl directly:

〉grepl("(?i)Withdrawn", x)
[1] TRUE TRUE TRUE TRUE TRUE

Or, if you prefer using an operator:

`%matches%` = grepl
〉"(?i)Withdrawn" %matches% x
[1] TRUE TRUE TRUE TRUE TRUE
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • THANK YOU! This explanation of how %in% works, how it can't be swapped left and right, and when to use it instead of grepl is exactly what I needed. Thank you again! – J.Sabree Jun 18 '19 at 14:20