1

Let Sigma = {a,b}. The regular expression RE = (ab)(ab)*(aa|bb)*b over Sigma.

  1. Give a string of length 5 in the set denoted by RE. Correct answer: abaab My answer: (ab)aab

I placed the parentheses there because they are in the RE. I understand why I don't need to, but is my answer incorrect? I tested it using RegEx, and the expression (ab)aab matched the text abaab, but it did not match when I reversed this.

Mathemats
  • 1,185
  • 4
  • 22
  • 35
jtetra13
  • 37
  • 1
  • 7

2 Answers2

1

Your answer is wrong because the parentheses do not belong to your set of symbols. The string (ab)aab cannot be generated using only symbols present in the {a,b} set.

Even more, you were asked to provide a string of 5 symbols but (ab)aab has length 7.

Parentheses have special meaning in regex. They create sub-regexps and capturing groups. For example, (ab)* means ab can be matched any number of times, including zero. Without parentheses, ab* means the regex matches one a followed by any number of bs. That's a different expression.

For example:

  • the regular expression (ab)* matches the empty string (ab zero times), ab, abab, ababab, abababab and so on;
  • the regular expression ab* matches a (followed by zero bs), ab, abb, abbb, abbbb and so on.

The first set of parentheses in your example is useless if you are looking only for sub-regexps. Both (ab) and ab expressions match only the ab string. But they can be used to capture the matched part of the string and re-use it either with back references or for replacement.

When parentheses are used for sub-expressions in regular expressions, they are meta-characters, do not match anything in the string. In order to match an open parenthesis character ( (found in the string) you have to escape it in the regex: \(.

Several strings that match the regular expression (ab)(ab)*(aa|bb)*b over Sigma = { 'a', 'b' }: abb, ababb, abababababb, ababababaabbaaaabbb.

The last string (ababababaabbaaaabbb) matches the regex pieces as follows:

ab            - (ab)
ababab        - (ab)*    - ('ab' 3 times)
aabbaaaabb    - (aa|bb)* - ('aa' or 'bb', 5 times in total)
b             - b

A regex that matches the string (ab)aab is \(ab\)(ab)*(aa|bb)*b but in this case
Sigma = { 'a', 'b', '(', ')' }

axiac
  • 68,258
  • 9
  • 99
  • 134
  • Why is it that (ab)aab has a length of 7? – jtetra13 Mar 04 '16 at 01:18
  • Errr... because it contains 7 characters? It is a regular string, not a `regex`. It should be over the `{a,b}` alphabet; it contains extra symbols but that doesn't mean we can ignore them. – axiac Mar 04 '16 at 01:21
  • So for it to be of length 5 it would have to be something like (ab)b ? – jtetra13 Mar 04 '16 at 01:25
  • Yes, but you still break the other requirement: the string must contain only symbols present in your alphabet (which is `{a,b}`). – axiac Mar 04 '16 at 01:25
  • A regular expression is a "formula" that can be used to generate words over an alphabet (a set of symbols). It can be used backwards too, to verify that a given word can be generated by the `regex` (matches the `regex`). – axiac Mar 04 '16 at 01:26
  • What about the RE (0|1)* ? That has parentheses and yet those are not included in the length of any of its concatenations. – jtetra13 Mar 04 '16 at 01:34
  • The length of the regular expression doesn't matter and it's not related to the length of the strings it matches. And yes, because they are not escaped, the parentheses present in the `regex` do not match any character from the string. They are just markers and they have a meaning only in the `regex`. Only the characters from the `regex` that are not special `regex` characters can appear in the matched string. A special `regex` character must be escaped by putting a `\` in front of it if you want it to have its literal value and not its special `regex` meaning. – axiac Mar 04 '16 at 01:44
1

() is syntax of regex and has its semantic meaning, you may have a look here and here

Similar to ^ or & and other reserved character in regex, you have to special handle to match them using regex, for example: Regex to Match Symbols: !$%^&*()_+|~-=`{}[]:";'<>?,./

Also, specifically in your question context, () should not appear as part of the string as it is not in the charater set (alphabet) {a,b}. And the string you provide has a lengh of 7 instead of 5, so it is correct to say it is wrong.

Community
  • 1
  • 1
shole
  • 4,046
  • 2
  • 29
  • 69