converting CFG to regular expression

Question

Here's a CFG that generates strings of 0s, 1s, or 0s and 1s arranged like this (001, 011) where one of the characters must have a bigger count than the other like in 00011111 or 00000111 for example.

S → 0S1 | 0A | 0 | 1B | 1
A → 0A | 0
B → 1B | 1

I tried converting it to regular expression using this guide but I got stuck here since I have trouble converting 0S1 given that anything similar to it can't be found in that guide.

S → 0S1 | 0+ | 0 | 1+ | 1    
A → 0A | 0    = 0+
B → 1B | 1    = 1+

One of my previous attempts is 0+0+1|0+1+1|1+|0+ but it doesn't accept strings I mentioned above like 00011111 and 00000111.

This might sound embarrassing but I have no idea what you ppl here are talking about, — Sieg, Oct 14 '21 at 06:15
What makes you think the language is regular? At first glance, it seems unlikely ("regular languages can't count") — rici, Oct 14 '21 at 06:32

Nikolay Handzhiyski · Answer 1 · 2021-10-14T08:49:26.583

Plug and Play

^(?!01$)(?!0011$)(?!000111$)(?!00001111$)(?=[01]{1,8}$)0*1*$

You cannot perfectly translate this to a regular expression, but you can get close, by ensuring that the input does not have equal number of 0 and 1. This matches up to 8 digits.

How it works

^ first you start from the beginning of a line
(?!01$) ensure that the characters are not 01
(?!0011$) ensure that the characters are not 0011
the same for 000111 and 00001111
then ensure that there are from 1 to 8 zeroes and ones (this is needed, to ensure that the input is not made of more digits like 000000111111, because their symmetry is not verified)
then match these zeroes and ones till the end of the line
for longer inputs you need to add more text, for up to 10 digits it is this: ^(?!01$)(?!0011$)(?!000111$)(?!00001111$)(?!0000011111$)(?=[01]{1,10}$)0*1*$ (you jump by 2 by adding one more symmetry validation)
it is not possible by other means with regular expressions alone, see the explanation.

Explanation

The A and B are easy, as you saw 0+ and 1+. The concatenations in S after the first also are easy: 00+, 0, 11+, 1, that all mixed into one lead to (0+|1+). The problem is with the first concatenation 0S1.

So the problem can be shorten to S = 0S1. This grammar is recursive. But neither left linear nor right linear. To recognize an input for this grammar you will need to "remember" how many 0 you found, to be able to match the same amount of 1, but the finite-state machines that are created from the regular grammars (often and from regular expressions) do not have a computation history. They are only states and transitions, and the machinery "jumps" from one state to the other and does not remember the "path" traveled over the transitions.

For this reason you need more powerful machinery (like the push-down automaton) that can be constructed from a context-free grammar (as yours).

converting CFG to regular expression

1 Answers1

Plug and Play

How it works

Explanation