With normal regular expressions, you can't do this - they can only match regular context-free languages (Type 3 in the Chomsky hierarchy of languages), while what you want to match is a classic example of a type 2 language.
Luckily, perl
regular expressions aren't very regular in the formal language theory sense. You can match this using a recursive regular expression:
$ perl -ne 'print if /^((?>A(?1)B|))$/' input.txt
AB
AAABBB
AAAAAAABBBBBBB
AAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBB
$ grep -P '^((?>A(?1)B|))$' input.txt
AB
AAABBB
AAAAAAABBBBBBB
AAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBB
(Where input.txt
contains all your test cases).
This matches either an empty string (0 A's followed by 0 B's), or a string starting with A, a successful recursive match of the pattern against the rest of the string minus the first and last characters, and ending with a B. If a B appears before an A, an A after a B, or the total number of A's and B's don't match, it thus fails. (?>regex)
is an optimization that prevents backtracking after a match failure.
If you want to enforce n >= 1
, a slight variation to lift one pair of A and B outside of the recursive section: ^A((?>A(?1)B|))B$
.