3

I don't understand the behavior of the built-in function pmatch (partial string matching).

The description provides the following example:

pmatch("m",   c("mean", "median", "mode")) # returns NA instead of 1,2,3

but using:

pmatch("m", "mean") # returns 1, as I would have expected. 

Could anybody explain to me this behavior?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Krisselack
  • 503
  • 3
  • 16
  • 1
    `nomatch`: the value to be returned at non-matching or multiply partially matching positions. Note that it is coerced to integer. – Sandipan Dey Sep 07 '18 at 09:47
  • 1
    Did you only want an explanation of why `pmatch` is broken (returns NA) for multiple partial matches, or an actual solution? I posted the latter. – smci Sep 07 '18 at 10:03

2 Answers2

4

Use grep instead - the NA-on-duplicates behavior of pmatch is incredibly annoying:

grep("^m",   c("mean", "median", "mode"))
[1] 1 2 3

> grep("ed",   c("mean", "median", "mode"))
[1] 2

The only downside is that pmatch(x, table... is vectorized for both args, but grep only for the second arg. So grep can't take a vector of patterns. But you can use stringi, or else sapply.

smci
  • 32,567
  • 20
  • 113
  • 146
  • 1
    "incredibly annoying" to me too, mostly because, I don't know how to use it. – zx8754 Sep 07 '18 at 10:05
  • 4
    @zx8754: it's not that you don't understand. `pmatch` is literally broken-by-design. Its default behavior is stupid, and there's no switch to turn that off. Since its input is vectorized, of course you would expect it to handle partial-matches, but it doesn't. `grep` does everything you need and more - partial matches, capture groups, backreferences etc. I have never needed `pmatch` and never seen it used. – smci Sep 07 '18 at 19:04
1

As per the documentation:

nomatch: the value to be returned at non-matching or multiply partially matching positions. Note that it is coerced to integer.

The nomatch defaults to NA (i.e., if there are multiple partial matches then NA will be returned).

pmatch("me",   c("mean", "median", "mode")) 
[1] NA  # returns NA instead of 1,2 since multiple partial matches

pmatch("mo",   c("mean", "median", "mode")) 
[1] 3   # since single partial match
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63