Writing a regex expression that finds 'zz' in a word but not at the start and the end

Question

I am having some difficulty writing a regex expression that finds words in a text that contain 'zz', but not at the start and the end of the text. These are two of my many attempts:

pattern = re.compile(r'(?!(?:z){2})[a-z]*zz[a-z]*(?!(?:z){2})')
pattern = re.compile(r'\b[^z\s\d_]{2}[a-z]*zz[a-y][a-z]*(?!(?:zz))\b')

Thanks

Can you please clarify what the input looks like? I think some have (mis)understood that you are matching against individual words instead of whole sentences or word lists. You’ll get different answers if you are matching “dazzle” vs “buzz\ndazzle zap razzle” — pilcrow, Dec 04 '21 at 12:37
Imagine having a text in a book and trying to find words that meet the criteria I listed. Jan already provided a solution. Thanks for trying to help. — shillos, Dec 04 '21 at 12:46
What about a word with several occurences of zz: azzazza, azzzzza ? — Casimir et Hippolyte, Dec 04 '21 at 16:20

score 3 · Accepted Answer · answered Dec 04 '21 at 12:33

3

Well, the direct translation would be

\b(?!zz)(?:(?!zz\b)\w)+zz(?:(?!zz\b)\w)+\b

See a demo on regex101.com.

Programmatically, you could use

text = "lorem ipsum buzz mezzo mix zztop but this is all"

words = [word 
         for word in text.split()
         if not (word.startswith("zz") or word.endswith("zz")) and "zz" in word]

print(words)

Which yields

['mezzo']

See a demo on ideone.com.

answered Dec 04 '21 at 12:33

Jan

42,290
8
54
79

Yeah very logical to assume that xD – shillos Dec 04 '21 at 12:51

bobble bubble · Answer 2 · 2021-12-04T14:06:48.787

3

Another idea to use non word boundaries.

\B matches at any position between two word characters as well as at any position between two non-word characters ...

\w*\Bzz\B\w*

See this demo at regex101

Be aware that above matches words with two or more z. For exactly two:

\w*(?<=[^\Wz])zz(?=[^\Wz])\w*

Another demo at regex101

Use any of those patterns with (?i) flag for caseless matching if needed.

edited Dec 04 '21 at 14:06

answered Dec 04 '21 at 13:49

bobble bubble

16,888
3
27
46

1

Very good idea, +1! – Jan Dec 04 '21 at 14:13
I tried your method as well but it detects words that have 'zz' at the end and at the beginning which is not wanted – shillos Dec 04 '21 at 15:25
@shillos Have you tried the demos? Should actually work :) and thank you @Jan! – bobble bubble Dec 04 '21 at 15:42
1

Good to read your answers again :-) – The fourth bird Dec 06 '21 at 15:19
@4th Bird, I'm happy about your comment, very kind! – bobble bubble Dec 06 '21 at 23:14

Casimir et Hippolyte · Answer 3 · 2021-12-04T16:50:06.217

2

You can use lookarounds:

\b(?!zz)\w+?zz\w+\b(?<!zz)

demo

or not:

\bz?[^\Wz]\w*?zz\w*[^\Wz]z?\b

demo

Limited to ASCII letters this last pattern can also be written:

\bz?[a-y][a-z]*?zz[a-z]*[a-y]z?\b

edited Dec 04 '21 at 16:50

answered Dec 04 '21 at 16:37

Casimir et Hippolyte

88,009
5
94
125

score 0 · Answer 4 · answered Dec 04 '21 at 12:27

0

You can use negative lookahead and negative lookbehind assertions in the regex.

>>> import re
>>> text = 'ggksjdfkljggksldjflksddjgkjgg'
>>> re.findall('(?<!^)g{2}(?!$)', text)
 ['gg']

answered Dec 04 '21 at 12:27

ThePyGuy

17,779
5
18
45

score 0 · Answer 5 · answered Dec 04 '21 at 12:30

0

Your criteria just means that the first and last letter cannot be z. So we simply have to make sure the first and last letter is not z, and then we have a zz somewhere in the text.

Something like

^[^z].*zz.*[^z]$

should work

answered Dec 04 '21 at 12:30

SztupY

10,291
8
64
87

Writing a regex expression that finds 'zz' in a word but not at the start and the end

5 Answers5