1

How do I write a regular expression to find:

  • ABA
  • AABBAA
  • AAABBBAAA
  • AAAABBBBAAAA
  • AAAAABBBBBAAAAA

Rules:

  • It always begins and ends with A
  • It always has a B in the middle
  • The first group of A's, the group of B's in the middle, and the second group of A's all must have the same amount of letters.
  • Must be valid all the way to infinity
anubhava
  • 761,203
  • 64
  • 569
  • 643
antoniopgs
  • 59
  • 2
  • 2
    You can satisfy all your requirements but one with the regular expression `^(A+)(B+)\1$`. [Demo](https://regex101.com/r/vWmWUP/2/). The exception is the need to confirm the sizes of the strings in the two capture groups are equal. For that I believe you will need a separate statement such as `$1.size == $2.size`, which will depend on the language you are using. – Cary Swoveland Jun 03 '20 at 18:54
  • @CarySwoveland I'm using Python. What are those statements? I'm googling it, but haven't figured it out yet. Thanks. – antoniopgs Jun 03 '20 at 19:31
  • You need to first see if the string matches the regex I gave (which I should have written `r'^(A+)(B+)\1$'`. If there's a match you then need to use Python code to test whether the length of the string in capture group 1 equals the length of the string in capture group 2. I believe you need to get a match object `m` and test whether the size of the string `m.group(0)` equals the size of the string `m.group(1)`. Is there a Pythonite in the audience? – Cary Swoveland Jun 03 '20 at 19:53
  • The argument being made in the comments above is that a regular expression _isn't adequate_ on its own. Adding extra code to do a check after the regex is complete is not in any way providing a regex that meets all the stated requirements. – Charles Duffy Jun 03 '20 at 20:37
  • (This is probably appropriate; I don't believe a fully complete answer is *possible*; for true/standardized ERE or BRE regex forms, so someone needs to be very careful about how the word "regex" is interpreted if they want to state otherwise). – Charles Duffy Jun 03 '20 at 20:38
  • BTW, is this homework? Why would you choose a regex as the right tool for this job, and insist that only a regex-based answer is acceptable? – Charles Duffy Jun 03 '20 at 20:39

2 Answers2

0

No possible regex language can describe this, because the language described isn't a regular language. Quoting from the Wikipedia link above:

The collection of regular languages over an alphabet Σ is defined recursively as follows:

  • The empty language Ø, and the empty string language {ε} are regular languages.
  • For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language.
  • If A and B are regular languages, then A ∪ B (union), A • B (concatenation), and A* (Kleene star) are regular languages.
  • No other languages over Σ are regular.

No combination of fixed strings, union, concatenation, or Kleene-star operations (which are necessarily zero-or-more without further constraints) can describe the above, because there is no operator that allows length-matching assertions. (Similarly, "regex" languages that allow backreferences are not true regex languages either).

Thus, any "regex syntax" that can describe the above language is not truly a regular expression syntax.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
0

The code below will work for the scenario described above and also with the exception @Cary Swoveland is talking about.

code

import re
count = 0
string = "AABBAA"
for i in string:
    if i == string[0]:
        count +=1
    else:
        break
#count = 2
R = "^(A+)(B{" + str(count) + "})(\\1)$" #^(A+)(B{2})(\1)$
#print(R)
r = re.compile(R)
print(re.findall(r, string))

you will have to count the the number of occurrences of A in string as you want the pattern to match those strings which have equal number of B(s).

output

[('AA', 'BB', 'AA')]

when string = "AAABBBBBBAAA"

output []

sage07
  • 83
  • 8
  • You're constructing a new regex specific to each string. This is not _a_ regex that matches only strings in the form the OP describes. – Charles Duffy Jun 03 '20 at 22:13