2

I need a regex that will match DK | EK | KD | KE. Seems simple enough with K[D|E]|[D|E]K but this does not work (at least not at regx101) for a sequence with KDK or KEK. This regex will match the first 2 characters, but will not match the final 2 characters. It will match a string of 4 characters, such as KDKD or KDKE or KEKE, but adding a K to any of these also fails to match the final 2 DK or EK characters.

What regex will allow these types of sequences (some of which are palindromes) to match?

EDIT To clarify a bit, KDK would match the first option (KD) but not the second option (DK) in my simple regex. Clearly these two matches overlap and I'm looking for a regex that will allow the match when there is an overlap.

EDIT2 And, I've realized this isn't necessarily limited to palindromes - original post corrected, above.

EDIT3 All of the following should match. I don't claim this is an exhaustive list. For non-matches, any character that does not consist of part of the list pairs (KD, KE, EK, DK) should not match. For example KDDDK would match the first 2 characters (KD) and the last 2 characters (DK) but not the central D as it is never preceded nor followed by a K.

KD
KE
EK
DK
KDK
KEK
EKE
DKD
EKD
DKE
KDKD
KDKE
KEKD
KEKE
KDDK
KEEK
DKDK
EKDK
KDEK
DKKD
DKKE
EKKE
EKKD
KEEKE 
Bohemian
  • 412,405
  • 93
  • 575
  • 722
KirkD-CO
  • 1,603
  • 1
  • 22
  • 35
  • Should be `[DE]K|K[DE]` for the first part of your question [demo](https://regex101.com/r/Q0J8dX/1). Can you please expand on the actual desired matches? It's not exactly clear about which characters you want to match further... – Roko C. Buljan Jun 27 '23 at 22:44
  • What should it match if it encounters `DKD`? – CAustin Jun 27 '23 at 22:46
  • Regex won't match parts that are matched in other matches. So KEK won't match KE and EK – Jerry Jeremiah Jun 27 '23 at 22:52
  • @KirkD-CO To be clear, you're trying to find all instances of those four 2-character sequences, even if they overlap? – CAustin Jun 27 '23 at 22:59
  • @CAustin - correct. The overlap is the key piece. From the discussion, it sounds like that isn't possible with regex. – KirkD-CO Jun 27 '23 at 23:02
  • It's... kind of possible, with the use of lookaround. You can find all instances of `D` that are followed by `K`, for example, but the `K` won't be included in the match itself. Not sure if that solves your problem. – CAustin Jun 27 '23 at 23:04
  • The regex101 link you shared is completely empty. Would be at least nice to see the needed matches strings and the not needed. Further explaining what's the desired would be really helpful to get a proper answer without hitting too many comments requesting explanation. Anyways, for what I guess what you want to achieve - Regex might not be the right tool for the job. – Roko C. Buljan Jun 27 '23 at 23:04
  • Example using lookahead to locate the first character of each match: https://regex101.com/r/zE62sm/1 – CAustin Jun 27 '23 at 23:06
  • https://stackoverflow.com/questions/233243/how-to-check-that-a-string-is-a-palindrome-using-regular-expressions – Roko C. Buljan Jun 27 '23 at 23:10
  • @RokoC.Buljan - I corrected the regex link – KirkD-CO Jun 27 '23 at 23:12
  • @KirkD-CO in the link you provided, the test strings are all the valid desired matches, right? Which are the not desired? – Roko C. Buljan Jun 27 '23 at 23:18
  • Your requirements are unclear. To help with clarification, please edit your question to provide a list of examples (each a different edge case) that should match and another list of non-matches. – Bohemian Jun 27 '23 at 23:25
  • 1
    @RokoC.Buljan - correct, all of these are desired matches. I did not include undesired matches, but something like KDDDKD should match all except the 3rd character - the D at position 3 should not match, while all others should. From the discussion above, it sounds like I'm not able to have overlapping matches. I did not realize this. – KirkD-CO Jun 27 '23 at 23:25
  • @Bohemian - the regex101 link now contains a series of desired matches. I will copy those into the main post, above. – KirkD-CO Jun 27 '23 at 23:26
  • @RokoC.Buljan - closest yet! But it fails on KDDDK in that it matches the central D which is not adjacent to a K. This one should match the first two characters and the last 2 characters, but not the central D. – KirkD-CO Jun 27 '23 at 23:36
  • @KirkD-CO yeah, I just realized after your question edit – Roko C. Buljan Jun 27 '23 at 23:36
  • Seems the problem is overlapping with all the odd length sequences having repeats. 8^( – KirkD-CO Jun 27 '23 at 23:37
  • Would this be closer to the desired? https://regex101.com/r/t5FU3A/1 – Roko C. Buljan Jun 27 '23 at 23:49
  • You can also use lookbehind (seems easier to reason about the output this way) https://regex101.com/r/t5FU3A/2 – Roko C. Buljan Jun 27 '23 at 23:51
  • Both of those seem to work with all the random tests I've typed so far. Nice! I've (obviously) not worked with regex enough to have found those techniques. Write this into an answer, please, and I'll accept it. – KirkD-CO Jun 27 '23 at 23:56
  • @KirkD-CO thanks for adding examples. can you please add a list of non-matching examples too? For example, should `KKEDK` not match? Or `KDKKKEK`? ie all D or E must have an adjacent K, but must all K have an adjacent D or E? – Bohemian Jun 28 '23 at 00:07
  • Another idea with use of [lookbehinds](https://regex101.com/r/6d8OU5/1): [`(?:K[DE]|[DE]K|(?<=K)[DE]|(?<=[DE])K)+`](https://regex101.com/r/6d8OU5/1) or match an *optional* `K` around `[DE]` and use a lookbehind to require it: [`(?:K?[DE]K?(?<=K.|.K))+`](https://regex101.com/r/oqO4Do/1) – bobble bubble Jun 29 '23 at 07:58

3 Answers3

1

Don't pipe delimit inside your alternatives [D|E], use just [DE].

What you could do is use a positive lookahead (?=) or lookbehind (?<=) to not consume the matches:

(?=(K[DE]|[DE]K))

https://regex101.com/r/jotsST/1

String Matches results
KD KD
KE KE
DK DK
EK EK
DKE DK, KE
KDK KD, DK
KEK KE, EK
KEEKE KE, EK, KE
KDDDK KD, DK
EKDEEK EK, KD, EK
Roko C. Buljan
  • 196,159
  • 39
  • 305
  • 313
1

edit once more : Since 3 is the magic number this will do odd combinations.
Where it reques a minimum of 3 for symetry.
This one does 9,7,5,3.

Use this as a template.
Just add more groups (total odd number) to extend it out to infinity.
If you assume a palindrome of D,K,E letters ...

(?=([DKE])((?!\1)[DKE])([DKE])([DKE]?)([DKE]?)([DKE]?)([DKE]?)([DKE]?)([DKE]?))(?:\9\8\7\6\5\4\3\2\1|\7\6\5\4\3\2\1|\5\4\3\2\1|\3\2\1)

https://regex101.com/r/QRKqNo/1

Explanation

 (?=
    ( [DKE] )                     # (1)
    (                             # (2 start)
       (?! \1 )
       [DKE] 
    )                             # (2 end)
    ( [DKE] )                     # (3)
    ( [DKE]? )                    # (4)
    ( [DKE]? )                    # (5)
    ( [DKE]? )                    # (6)
    ( [DKE]? )                    # (7)
    ( [DKE]? )                    # (8)
    ( [DKE]? )                    # (9)
 )
 (?:
    \9 \8 \7 \6 \5 \4 \3 \2 \1    # 9 ckeck
  | \7 \6 \5 \4 \3 \2 \1          # 7 check
  | \5 \4 \3 \2 \1                # 5 check
  | \3 \2 \1                      # 3 check
 )

To do the same but in a overlapped way, move the checks at the end to inside of the assertion at the bottom of it.

Note that the symmetry acts as its own backtracking filter, which to me
I found surprising.

sln
  • 2,071
  • 1
  • 3
  • 11
1

Use look arounds to match, but not consume, K's when matching D or E.

^(K*([ED](?=K)|(?<=K)[ED])K*)*

See live demo.

Bohemian
  • 412,405
  • 93
  • 575
  • 722