2

I need to write a regular expression that can detect a string that contains only the characters x,y, and z, but where the characters are different from their neighbors.

Here is an example

xyzxzyz = Pass

xyxyxyx = Pass

xxyzxz = Fail (repeated x)

zzzxxzz = Fail (adjacent characters are repeated)

I thought that this would work ((x|y|z)?)*, but it does not seem to work. Any suggestions?

EDIT

Please note, I am looking for an answer that does not allow for look ahead or look behind operations. The only operations allowed are alternation, concatenation, grouping, and closure

MZimmerman6
  • 8,445
  • 10
  • 40
  • 70
  • Why doesn't `((x|y|z)?)*` work? What doesn't it do that you want it to do? You can't just say "it doesn't work". – Andy Lester Dec 03 '12 at 02:38
  • If this is computer science kind of question where you are asked to write a regex for a grammar, then you should state the requirement that the regex can only consist of alternation, concatenation, grouping, and closure (*), since regex in practical use has more syntax that allows for a simple solution. – nhahtdh Dec 03 '12 at 02:40
  • 2
    @AndyLester: It obviously can match a string with repeated character. – nhahtdh Dec 03 '12 at 02:40
  • @nhahtdh It is a computer science question that only really allows for the mentioned operations. If it were real syntax I would have been fine doing this with the many resources that give examples of this online, but I was not able to really think of anything that could do it without look ahead and look behind operations – MZimmerman6 Dec 03 '12 at 02:51
  • @MZimmerman6: You should edit your question to reflect that fact - people would just assume you can use whatever you want. – nhahtdh Dec 03 '12 at 02:55

4 Answers4

13

Usually for this type of question, if the regex is not simple enough to be derived directly, you can start from drawing a DFA and derive a regex from there.

You should be able to derive the following DFA. q1, q2, q3, q4 are end states, with q1 also being the start state. q5 is the failed/trap state.

DFA

There are several methods to find Regular Expression for a DFA. I am going to use Brzozowski Algebraic Method as explained in section 5 of this paper:

For each state qi, the equation Ri is a union of terms: for a transition a from qi to qj, the term is aRj. Basically, you will look at all the outgoing edges from a state. If Ri is a final state, λ is also one of the terms.

Let me quote the identities from the definition section of the paper, since they will come in handy later (λ is the empty string and ∅ is the empty set):

(ab)c = a(bc) = abc
λx = xλ = x
∅x = x∅ = ∅
∅ + x = x
λ + x* = x*
(λ + x)* = x*

Since q5 is a trap state, the formula will end up an infinite recursion, so you can drop it in the equations. It will end up as empty set and disappear if you include it in the equation anyway (explained in the appendix).

You will come up with:

R1 = xR2 + yR3 + zR4 + λ
R2 =     + yR3 + zR4 + λ
R3 = xR2 +     + zR4 + λ
R4 = xR2 + yR3       + λ

Solve the equation above with substitution and Arden's theorem, which states:

Given an equation of the form X = AX + B where λ ∉ A, the equation has the solution X = A*B.

You will get to the answer.

I don't have time and confidence to derive the whole thing, but I will show the first few steps of derivation.

Remove R4 by substitution, note that zλ becomes z due to the identity:

R1 = xR2 + yR3 + (zxR2 + zyR3 + z) + λ
R2 =     + yR3 + (zxR2 + zyR3 + z) + λ
R3 = xR2 +     + (zxR2 + zyR3 + z) + λ

Regroup them:

R1 = (x + zx)R2 + (y + zy)R3 + z + λ
R2 =       zxR2 + (y + zy)R3 + z + λ
R3 = (x + zx)R2 +       zyR3 + z + λ

Apply Arden's theorem to R3:

R3 = (zy)*((x + zx)R2 + z + λ)
   = (zy)*(x + zx)R2 + (zy)*z + (zy)*

You can substitute R3 back to R2 and R1 and remove R3. I leave the rest as exercise. Continue ahead and you should reach the answer.

Appendix

We will explain why trap states can be discarded from the equations, since they will just disappear anyway. Let us use the state q5 in the DFA as an example here.

R5 = (x + y + z)R5

Use identity ∅ + x = x:

R5 = (x + y + z)R5 + ∅

Apply Arden's theorem to R5:

R5 = (x + y + z)*∅

Use identity ∅x = x∅ = ∅:

R5 = ∅

The identity ∅x = x∅ = ∅ will also take effect when R5 is substituted into other equations, causing the term with R5 to disappear.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • wow, thank you. I did not expect such an in depth answer, and I don't think the professor does either, but thank you so much! – MZimmerman6 Dec 03 '12 at 04:27
  • @HugoDozois: I supposed inf. loop expression means closure `*`? – nhahtdh Dec 03 '12 at 04:35
  • @HugoDozois: It is possible if you follow the steps above. Since the DFA is there, and you can verify it that the DFA is correct, there will exist a regex that matches a string which doesn't have consecutive appearances of x, y, z. – nhahtdh Dec 03 '12 at 04:39
  • @HugoDozois: The final expression doesn't have recursive property. Actually, it becomes a repeating sequence (possibly followed by or preceded by some string) that guarantees to not contain adjacent repeating character - not something you can derive by just thinking about it directly. – nhahtdh Dec 03 '12 at 04:56
  • @nhahtdh the link with paper doesn't work anymore. And it's unclear what you do for non-final states. In your example all states are final... – Coder-Man Sep 26 '18 at 13:13
  • @Coder-Man: Non-final states can be removed from the equation, it's explained in the Appendix – nhahtdh Sep 27 '18 at 03:30
  • @nhahtdh it does not say that it's a non-final state, it says that it's a trap state. A trap state is a state from which you cannot get out. A non-final state is something else. – Coder-Man Sep 27 '18 at 06:53
  • @Coder-Man: Oh, I'm sorry. It has been a long time since I wrote this. Non-final states will not have lambda as a term. – nhahtdh Sep 27 '18 at 10:44
  • @nhahtdh why is that so, intuitively? Is there a proof for this? – Coder-Man Sep 27 '18 at 10:52
  • 1
    @Coder-Man: Final state means that you can end right there with an empty string (lambda), or consume more character to transition to other states. Non-final state only has the option of transitioning to other states, so no empty string (lambda) – nhahtdh Sep 27 '18 at 10:57
  • @nhahtdh oh yeah, that makes sense, thanks. I should have understood it. And Brzozowski method can be used only with DFA, right? Or can we use it with NFA? – Coder-Man Sep 27 '18 at 10:59
3

This should do what you want:

^(?!.*(.)\1)[xyz]*$

(Obviously, only on engines with lookahead)

The content itself is handled by the second part: [xyz]* (any number of x, y, or z characters). The anchors ^...$ are here to say that it has to be the entirety of the string. And the special condition (no adjacent pairs) is handled by a negative lookahead (?!.*(.)\1), which says that there must not be a character followed by the same character anywhere in the string.

Amadan
  • 191,408
  • 23
  • 240
  • 301
1

I've had an idea while I was walking today and put it on regex and I have yet to find a pattern that it doesn't match correctly. So here is the regex :

^((y|z)|((yz)*y?|(zy)*z?))?(xy|xz|(xyz(yz|yx|yxz)*y?)|(xzy(zy|zx|zxy)*z?))*x?$

Here is a fiddle to go with it!

If you find a pattern mismatch tell me I'll try to modify it! I know it's a bit late but I was really bothered by the fact that I couldn't solve it.

trincot
  • 317,000
  • 35
  • 244
  • 286
Hugo Dozois
  • 8,147
  • 12
  • 54
  • 58
0

I understand this is quite an old question and has an approved solution as well. But then I am posting 1 more possible and quick solution for the same case, where you want to check your regular expression that contains consecutive characters.

Use below regular expression:

String regex = "\\b\\w*(\\w)\\1\\1\\w*";

Listing possible cases that above expression returning the result.

Case 1: abcdddd or 123444

Result: Matched

Case 2: abcd or 1234

Result: Unmatched

Case 3: &*%$$$ (Special characters)

Result: Unmatched

Hope this will be helpful... Thanks:)

Nikhil Lotke
  • 605
  • 7
  • 15