(Note: I found a reasonable solution using String.split()
instead of Regexp.match()
, but I'm still interested in the theoretical regexp question.)
Given a string that may or may not end with the letter a
, and may have any number of letters a
in other positions, is there a regexp that lets me capture the trailing a
if present as one group, and all previous characters as another? E.g.:
Input | Group 1 | Group 2 |
---|---|---|
'a' |
'' * |
'a' |
'b' |
'b' |
'' |
'ba' |
'b' |
'a' |
'baaa' |
'baa' |
'a' |
'baaab' |
'baaab' |
'' |
* nil
instead of the empty string would also be acceptable
Some things I've tried that haven't worked:
- The naive approach:
/^(.*)(a?)$/
- The same, but with a numeric repetition limit:
/^(.*)(a{0,1})$/
- The same, but with an atomic group:
/^(.*)((?>a?))$/
- The same, but with negative lookahead in the first group:
/^(.*(?!=a))(a?)$/
All of these fail to capture the trailing a
if present:
input | expected | actual |
---|---|---|
'a' |
'', 'a' |
'a', '' |
'ba' |
'b', 'a' |
'ba', '' |
'baaa' |
'baa', 'a' |
'baaa', '' |
The closest I've been able to come is to use |
to split between the cases with and without a trailing a
. This comes close, but at the expense of producing twice as many capture groups, such that I'll need to do some additional checking to decide whether to use the left or right pair of groups:
/^(?:(.*)(a)$|(.*[^a])()$)/
input | expected | actual |
---|---|---|
'a' |
'', 'a' |
'', 'a', nil, nil |
'b' |
'b', '' |
nil, nil, 'b', '' |
'ba' |
'b', 'a' |
'b', 'a', nil, nil |
'baaa' |
'baa', 'a' |
'baa', 'a', nil, nil |
'baaab' |
'baaab', '' |
nil, nil, 'baaab', '' |
The solution I've found is to throw out Regexp.match
entirely and just use String.split
. This comes close enough for my purposes:
input.split(/(a?)$/)
input | expected | actual |
---|---|---|
'a' |
'', 'a' |
'', 'a' |
'b' |
'b', '' |
'b' (close enough) |
'ba' |
'b', 'a' |
'b', 'a' |
'baaa' |
'baa', 'a' |
'baa', 'a' |
'baaab' |
'baaab', '' |
'baaab' (close enough) |
This works, but I'd still like to know if there's a way to do it as a straight regexp match.