Match if there is a X in the first and X the second Y can be 0

Question

I am currently working on a program. I need a regex that takes Y and X and that pairs of X is separated by Y. It does not have to be equal numbers, but it cannot contain multiple X'es on side of each other.

Examples:

# Don't match:
XXXYYYYY
#Match:
XYXYYYY
X

My try so far:

{Y*[X|^X]Y*[X|^X]Y*}*

The problem is that if there is a X in the first and X in the second the Y still can be 0. Can i directly test for double X's?

@Unihedron That was not the point, for you to react on that, sorry, but all i need is some theory that i maybe can test for double X's or something like that — simonkaspers1, Oct 27 '14 at 09:29
What kind of RE you are looking for? [tag:Perl] style regex, or [tag:awk] style ERE, or the plain RE used by [tag:sed]? — TrueY, Oct 27 '14 at 09:38

score 3 · Answer 1 · edited May 23 '17 at 11:49

3

Since the answers above uses look-ahead, this answer present a solution in vanilla regular expression:

^(X?Y)*X?$

The solution above assumes empty string is allowed. Otherwise:

^((X?Y)+X?|X)$

(Feel free to make the groups non-capturing)

Thanks to Unihedron for the simplification XY|Y to X?Y.

If anyone still have doubt about the validity of this answer, solve the below equations:

R1 = XR2 + YR3 + λ
R2 =       YR3 + λ
R3 = XR2 + YR3 + λ

The DFA can be drawn from the equations above.

Remove the + λ in R1 if empty string is disallowed.

edited May 23 '17 at 11:49

Community

1
1

answered Oct 27 '14 at 10:00

nhahtdh

55,989
15
126
162

Following a bit of logical induction, you'll have `^(X?Y)*X?$` which would work best. Well done! – Unihedron Oct 27 '14 at 10:01

score 2 · Accepted Answer · answered Oct 27 '14 at 09:23

2

What's so unusual about it?

^(?:X(?!X)|Y)+$

DEMO

Explanation: it's just a series of X and Y where an X cannot be followed by another X (negative lookahead).

answered Oct 27 '14 at 09:23

Lucas Trzesniewski

50,214
11
107
158

Since `Y` is logically going to appear equal or more than the amount of `X`s, changing the alternation to include `Y` in the front may make it slightly more efficient in matching the input. – Unihedron Oct 27 '14 at 09:51
@Unihedron You can even do better: `^(?:Y++|X(?!X))+$` or `^(?:(?>Y+)|X(?!X))+$` but since OP didn't specify the regex flavour I left that optimization out. – Lucas Trzesniewski Oct 27 '14 at 09:54
Insighted comment and great point! However as a point of note, in PCRE `Y++` is defined as the same as `(?>Y+)`. – Unihedron Oct 27 '14 at 09:55
Yes, but for instance, the possessive quantifier modifier (`++`) is not available in .NET so you have to use atomic groups (`(?>...)`)... again, everything depends on the flavour :) – Lucas Trzesniewski Oct 27 '14 at 09:56

score 0 · Answer 3 · answered Oct 27 '14 at 09:23

0

^(?!.*?XX)X[YX]*$

Try this.This should fulfill your requirement.See demo.

http://regex101.com/r/sU3fA2/8

answered Oct 27 '14 at 09:23

vks

67,027
10
91
124

Casimir et Hippolyte · Answer 4 · 2014-10-27T10:12:00.220

0

You can use this pattern:

^Y*(XY+)*X?$

if you want to ensure that there is at least one character, you can check the length separately or add a lookahead at the begining:

^(?=.)Y*(?:XY+)*X?$

about catastrophic backtracking:

If you use a DFA regex engine there is no problem since there is no backtracking.

if you use a NFA regex engine, you can prevent the catastrophic backtracking with several ways, examples:

^Y*(XY|Y)*X?$       # possible but not really efficient

^Y*(?>XY+)*X?$      # using an atomic group (if available)

^Y*(?:XY+)*+X?$     # using a possessive quantifier (if available)

^Y*(?=((XY+)*))\1X?$ # emulate `(?>(?:XY+)*)`

^Y*(?:(?=(XY+))\1)*X?$  # emulate `(?>XY+)*`

edited Oct 27 '14 at 10:12

answered Oct 27 '14 at 09:58

Casimir et Hippolyte

88,009
5
94
125

Interesting theory, though for the sake of not burning up users' code with catastrophic backtracking, better mention using `(?> )` instead of `(?: )` where the language supports it :) – Unihedron Oct 27 '14 at 10:00
`^Y*(XY|Y)*X?$` Drop the first `Y*` (since `Y*` is already covered in the `(XY|Y)*` and the solution will be the same as mine. The rest are redundant, since `^(XY|Y)*X?$` will never have backtracking problem. – nhahtdh Oct 27 '14 at 10:56
@nhahtdh: `^Y*(XY|Y)*X?$` can be indeed shorten to `^(X?Y)*X?$`. However this pattern stay less efficient since the number of steps before failing is 4x greater than patterns that use atomic grouping. So they are not redundant at all. So, with unihedron simplification, the most efficient pattern is `^(?>X?Y+)*+X?$` – Casimir et Hippolyte Oct 27 '14 at 14:58
@CasimiretHippolyte: I can't observe 4x difference. However, I do see a 2x difference if I use `^(X?Y)*+X?$`. The `(?>` is unnecessary when possessive quantifier is used. Anyway, it is still linear regardless. – nhahtdh Oct 27 '14 at 15:15
@nhahtdh: you will better see the 4x difference with these links: http://regex101.com/r/tN7cN9/1 http://regex101.com/r/yZ5jX7/1 . Indeed, `^(?:X?Y+)*+X?$` should suffice. – Casimir et Hippolyte Oct 27 '14 at 15:22
Oh, there is a `+` after `Y`. The number of steps doesn't reflect the number of characters in the string scanned. Since `(X?Y+)*+` and `(X?Y)*+` on `YYYYY` input both requires a scan on all characters involved, but `Y+`'s scan records as 1 step, while `(X?Y)*+`'s scan records as at least 5 steps (or more). The slowdown probably does happen, though, when the engine creates state for the `X?Y` pattern for every Y and XY, while `X?Y+` only create state for every X. – nhahtdh Oct 27 '14 at 15:42
@nhahtdh: Yes, but the difference decreases when the proportion of X tends to 50%. – Casimir et Hippolyte Oct 27 '14 at 16:01

Match if there is a X in the first and X the second Y can be 0

4 Answers4