-1

I need to determine how to much a src string by a pat string like:

src ='AAAABBBB'
pat ='(A+|B+)B+'

However, '+' matches any 1 or any number of appearance, it might be extremely slow when src is huge. But since I know exactly what the src is, I could design a pat to match exactly how many 'A' or 'B' appear in each period, like,

pat = '(A|B)\4B\4'

But my question also requires a syntax that takes 4 appearance or less, like

pat= '(A|B)\4(or less)B\4(or less)'

Anyone knows this syntax?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Jun
  • 171
  • 4
  • 16

1 Answers1

14

You can specify a specific number of repetitions with the {m} syntax, where m is the number of repetitions expected:

A{4}B{4}

would require exactly four A and four B characters.

There is similar syntax to specify a range instead of a fixed number; from the Regular Expression syntax documentation:

{m}
Specifies that exactly m copies of the previous RE should be matched; fewer matches cause the entire RE not to match. For example, a{6} will match exactly six 'a' characters, but not five.

{m,n} Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible. For example, a{3,5} will match from 3 to 5 'a' characters. Omitting m specifies a lower bound of zero, and omitting n specifies an infinite upper bound. As an example, a{4,}b will match aaaab or a thousand 'a' characters followed by a b, but not aaab. The comma may not be omitted or the modifier would be confused with the previously described form.

{m,n}?
Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as few repetitions as possible. This is the non-greedy version of the previous qualifier. For example, on the 6-character string 'aaaaaa', a{3,5} will match 5 'a' characters, while a{3,5}? will only match 3 characters.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Good! But what if I cannot determine if it is greedy or non-greedy? – Jun Jun 04 '14 at 17:19
  • 1
    @Jun: What do you mean? Variable-size quantifiers (`*`, `+`, `{m,n}`) all come in greedy and non-greedy variants; the latter by adding `?` to the quantifier. `{m}` can never be greedy or non-greedy, it matches an exact number of characters instead. – Martijn Pieters Jun 04 '14 at 17:21
  • OK I understand, so greedy or non-greedy happens only if matching has different results, if not matching at all, no greedy or non-greedy things to concern. Thanks – Jun Jun 04 '14 at 17:25
  • 2
    Not to forget: `{m,n}` and not `{m, n}` (the pattern should not have a space, otherwise it will interpret the curly braces as literals) – Martin Thoma Jul 21 '17 at 07:41
  • This might be a bit old, but if I want to choose exactly 2 or 4? Like `{2|4}` but works. – sheldonzy Oct 21 '17 at 09:03
  • @sheldonzy: then use a group of two patterns separated by `|`, e.g. `(?:f{4}|f{2})`. Put the long version first if later patterns could also match the extra characters; it'll be considered before testing if the shorter version matches (`|` alternatives are always non-greedy). – Martijn Pieters Apr 03 '21 at 18:14