-2

I have been searching around but couldn't find the answer.

I want to look for a 3 letter pattern where 2 of them should be either D or E and the third one can be anything. And the order doesn't matter.

For example: DEA or ESD or DZE or PDE should all match. For a sample string like this 'EDEDEDADEDE', overlapping patterns should also be detected so 'EDE', 'DED', 'EDE', 'DED', 'EDA', 'DAD', 'ADE', 'DED', 'EDE' should all be in the search.

What I can think of is: [A-Z][DE]{2}|[DE][A-Z][DE]|[DE]{2}[A-Z] but this seems clunky. Any simpler solutions?

Thanks.

Josh
  • 59
  • 7
  • If you know the input is of length 3, then this regex would be simpler `[DE].?[DE]` – user1211 Oct 02 '18 at 03:33
  • Note that `[DE].?[DE]` fails for "PDE". – kungphu Oct 02 '18 at 05:46
  • Maybe https://regex101.com/r/BNI35Z/1? – Wiktor Stribiżew Oct 02 '18 at 08:18
  • Like kungphu pointed out, the order doesn't matter. I don't know where the D or E will be. – Josh Oct 02 '18 at 16:32
  • In comments to answers you mention you also need to match cases like "DPD". Please update your question to contain such test cases. As it is right now it can be understood like you want BOTH "d" and "e" and a third letter in your match. You failed to mention that for example "D" can occur twice. – Asunez Oct 03 '18 at 07:59

3 Answers3

0

You don't need regex for this; it's much more readable without.

valid = ("D" in s) and ("E" in s)

If you also need to validate length, just stick len(s) == 3 on before the letter checks.

If you're required to use regex, this answer seems to have the details covered.

import re

DE = re.compile(r"(?=.*D)(?=.*E)")

all(map(DE.match, ("DEA", "ESD", "DZE", "PDE")))
# True
all(map(DE.match, ("DEA", "ESD", "DZE", "PDE", "QQQ")))
# False

Edit: Note that this assumes both D and E must be present in the string, which matches the provided examples but not really the problem statement, on the assumption that the problem statement was not exactly accurate.

kungphu
  • 4,592
  • 3
  • 28
  • 37
  • Hi, this 3 letter pattern will be part of my regex search. I don't know how long it will be as long as it contains this at the beginning. – Josh Oct 02 '18 at 16:20
  • Then I'd just run the regex or other check against `str[0:3]`. – kungphu Oct 03 '18 at 00:35
0

Try this pattern (?=[^\s]{0,2}D)(?=[^\s]{0,2}E)...

It first assures, that what follows is a word (string of characters, except white spaces, achieved with nagetiva character class [^\s]) containing D and E.

For each letter there is separate positive lookahead:

  • (?=[^\s]{0,2}D) for D,
  • (?=[^\s]{0,2}E) for E.

If those are satisfied, then match three characters with ....

Demo

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
  • Can you give a bit more explanation on how this works? Especially what is happening inside the lookaheads? – Asunez Oct 02 '18 at 06:01
  • @Asunez I just added explanation :) – Michał Turczyn Oct 02 '18 at 06:01
  • Okay, so if I understand correctly, each lookahead is looking for 2 non-whitespace characters and a letter D/E afterwards - but how does this work for DZE, where D is before, not after, the non-whitespace character? – Asunez Oct 02 '18 at 06:04
  • @Asunez Everything is explained, if you have doubts, look at demo I provided. It works with `DZE` also. Pattern simply looks at closest word and looks for `D` and `E` separately, at most "two non-white space characters away". – Michał Turczyn Oct 02 '18 at 06:06
  • This failed to capture DFD in one of my trials. – Josh Oct 02 '18 at 16:30
0

How about:

\b(?=.?[DE].?[DE])[A-Z]{3}\b

Explanation:

\b              : word boundary
    (?=         : start lookahead, zero-length assertin that make sure we have
        .?      : optional any character
        [DE]    : D or E
        .?      : optional any character
        [DE]    : D or E
    )           : end lookahead
    [A-Z]{3}    : A capital letter, must appear 3 times
\b              : word boundary

See it in action:

https://regex101.com/r/uo7tv8/2

Python implementation:

str = 'For example: DEA or ESD or DZE or PDE should all match, but not DEDE ABC DEF GHI JKL.'
regex = r"\b(?=.?[DE].?[DE])[A-Z]{3}\b"
print re.findall(regex, str)

Output:

['DEA', 'ESD', 'DZE', 'PDE', 'DEF']

Edit according to comment:

str = 'ADFDFAGERASDFSAERSEDSEDEFADF'
regex = r"(?=.?[DE].?[DE])[A-Z]{3}"
print re.findall(regex, str)

Output:

['ADF', 'SED', 'SED']
Community
  • 1
  • 1
Toto
  • 89,455
  • 62
  • 89
  • 125
  • It didn't work for me. I ran the search and it found nothing. – Josh Oct 02 '18 at 16:31
  • Are there other characer arround those 3 ones. Please, [edit your question](https://stackoverflow.com/posts/52601558/edit) and add some test cases, sample text and expected result. My regex works for string that contain only 3 letters as you can see in the link to regex101 – Toto Oct 02 '18 at 16:38
  • @Josh: You may want to replace anchors `^` and `$` with word boundary `\b` at the beginning and at the end of the regex. – Toto Oct 02 '18 at 16:39
  • here is a test string you can try: 'ADFDFAGERASDFSAERSEDSEDEFADF'. You are supposed to fnd 'DFD', 'SED', 'SED' – Josh Oct 02 '18 at 19:49
  • @Josh: This not what you have explain in your question. As I said above, [edit your question](https://stackoverflow.com/posts/52601558/edit) and add some test cases, sample text and expected result. What should be the result for `DEDEDEDE`? – Toto Oct 03 '18 at 08:28
  • Sorry if I wasn't clear. I edited my quesiton and included a sample string. Per your question: 'DED', 'EDE', 'DED', 'EDE', 'DED', 'EDE' 6 patterns should be detected. – Josh Oct 03 '18 at 17:49