2

From this code golf post with the relevant portion reproduced below.

I also don't know why I need {,2} instead of {,3} but it seems to work. If anyone knows the answer to that, let me know.

grepl('^M{,2}(C[MD]|D?C{,2})(X[CL]|L?X{,2})(I[XV]|V?I{,2})$',scan(,''))

Try it online!

For a simpler example:

rn <- c('X', 'MX', 'MMX', 'MMMX', 'MMMMX')
grepl('^M{,2}X$',rn)
[1]  TRUE  TRUE  TRUE  TRUE FALSE

Why does {,2} find 3 or fewer instances of M equivalent to {0,3}? Furthermore, why does this work at all? The regex guides I've found only speak of a missing upper bound (like {3,}) not a missing lower bound. If one has the perl=TRUE option set, R reads all as false.

> grepl('^M{,2}X$',rn, perl=T)
[1] FALSE FALSE FALSE FALSE FALSE
CT Hall
  • 667
  • 1
  • 6
  • 27

2 Answers2

1

{,n} is not a commonly supported pattern for representing a quantity between 0 and n in regex. It works in Python, but most other regex flavors treat it as either a syntax error or character literals.

This is not to be confused with {n,} (between n and unlimited times) which is ubiquitous.

Instead, just use {0,n}.

CAustin
  • 4,525
  • 13
  • 25
1

Here's the problem ... The empty first instance quantification could be 0 (but see the link below to realize it's a slightly different issue) and as such it could match anything. Observe:

grepl('X{,2}', c("x","X","XX","XXX","XXXX") )
[1] TRUE TRUE TRUE TRUE TRUE

Even the lower case-"x" gets a "pass" since it has no characters that match "X".

Turns out that this has been asked and answered before:

Unexpected match of regex

IRTFM
  • 258,963
  • 21
  • 364
  • 487