2

I am new to using regex but I feel my pattern may be too complex.

I am looking for a pattern of a minimum number of brackets with a maximum number of dots interspersed. I can't see a way for regex to count the numbers of dots in the overall pattern instead of sequentially.

For example:

...((((((((.(((..((..((((.(((((((.(..(((((.(((.(((...))).))).)))))..)..))))))).))))..))..))).))))))))(((.((.(((((...((........))))))))))))............

If I want to identify a run of at least 25 (s with a maximum of 15 .s interspersed from the first ( to the last:

...((((((((.(((..((..((((.(((((((.(..(((((.(((.(((...))).))).)))))..)..))))))).))))..))..))).))))))))(((.((.(((((...((........))))))))))))............

My regex is currently searching for a a sequence with a maximum of 15 consecutive .s instead.

Is this possible? If not should I be using an alternative (i.e. pyparsing)

This is what I have so far:

(\.{0,15}\(){25,}
InSync
  • 4,851
  • 4
  • 8
  • 30
SJP
  • 31
  • 1
  • 3
    regex is not a correct tool for counting *anything*. – matszwecja Aug 10 '23 at 13:45
  • @matszwecja SInce the counts of both chars are finite, you could enumerate all possible combinations that satisfy the restricitions and thus keep it regular, but in reality even with the given numbers involved, this isn't feasible. Just in terms of formal language theory, it is dependent and possibly unlimited counts that make a language non-regular. – user2390182 Aug 10 '23 at 13:50
  • But especially, if you are looking to find a matching number of closing parentheses, you defintely leave the realm of regular langauges. – user2390182 Aug 10 '23 at 13:51
  • Why use regex to do a task thst doesn't seem very hard with basic Python string methods (unless it's for the thrill of solving a puzzle, which I totally respect :) )? – Swifty Aug 10 '23 at 13:55
  • Do you mean at least 25 `(` and any amount of periods in between but no more than 15 *consecutive* periods, something like [this regex101 demo](https://regex101.com/r/ABgTop/1)? To me it's not so clear also if this sequence may only consist of `.` and `(` like in my demo, or also take the closing `)` into account. – bobble bubble Aug 10 '23 at 14:58
  • @matszwecja Just to prove otherwise, checkout [`(?){15}\((?:(?\()|(?<-d>\.))+\((?<-b>){23}(?(b)(?!))`](https://regex101.com/r/SYHReL/2) ([shorter version](https://regex101.com/r/SYHReL/3)) and [this answer](https://stackoverflow.com/a/76822352) of mine. This doesn't work in Python though. – InSync Aug 10 '23 at 18:56

4 Answers4

2

Based on @Freeman's idea combining regex and string operations:

import re

s = "...((((((((.(((..((..((((.(((((((.(..(((((.(((.(((...))).))).)))))..)..))))))).))))..))..))).))))))))(((.((.(((((...((........))))))))))))............"

pattern = r"\([\(.]+\(" # this pattern starts and ends with '(' and may only contain '(' and '.'

matches = re.findall(pattern, s) # find all such patterns

for match in matches:
    print(match)
    print(f"('s in match: {match.count('(')}")  # count characters of each type in pattern
    print(f".'s in match: {match.count('.')}")  #

Output:

((((((((.(((..((..((((.(((((((.(..(((((.(((.(((
('s in match: 36
.'s in match: 11
(((.((.(((((...((
('s in match: 12
.'s in match: 5

With the counts you can easily filter out matches according to your specific requirements.

matszwecja
  • 6,357
  • 2
  • 10
  • 17
  • 1
    The regex will only ever find non-overlapping matches, which gives you just 2 matches instead of [12](https://regex101.com/r/hWJAcw/1). – InSync Aug 10 '23 at 19:04
2

You could use:

\((?!(?:\(*\.){16})(?:\.*\(){24,}

The pattern matches:

  • \( Match (
  • (?!(?:\(*\.){16}) Negative lookahead, assert not 16 dots directly to the right of the current position, allowing only optional ( chars in between
  • (?:\.*\(){24,} Repeat 24 or more times matching optional dots followed by matching a single (

Regex demo

If you want to allow trailing dots:

\((?!(?:\(*\.){16}\.*\()(?:\.*\(){24,}

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Hmm, because of your negative lookahead, if you add `.` at the end of the 2nd example in your demo, the regex will fail to detect the (though still valid) match. – Swifty Aug 11 '23 at 08:09
  • @Swifty Because then there are 16 dots. If that match should also be valid, then you can add matching another opening bracket `\((?!(?:\(*\.){16}\.*\()(?:\.*\(){24,}` https://regex101.com/r/6uNKHI/1 – The fourth bird Aug 11 '23 at 08:22
  • But then it will still fail if you add a `(` after the `.` ; my point is that, if your regex finds a match but that match is invalidated when you add more chars to the text, the regex isn't good enough! – Swifty Aug 11 '23 at 08:36
  • @Swifty In that case, perhaps something like this with the group 1 value https://regex101.com/r/YBDhVy/1 – The fourth bird Aug 11 '23 at 19:26
1

I think you can use a combination of regex and string manipulation in python like this for example :

import re

#sample data
text = "...((((((((.(((..((..((((.(((((((.(..(((((.(((.(((...))).))).)))))..)..))))))).))))..))..))).))))))))(((.((.(((((...((........))))))))))))............"

#to match everything between the outer brackets
pattern = r"\([^()]*\)"

#find all matches of the pattern
matches = re.findall(pattern, text)

#iterate through the matches
for match in matches:
    dot_count = match.count(".")
    if dot_count <= 15:
        print("Pattern matched!")
        break
else:
    print("Pattern not matched.")

Output:

Pattern matched!
Freeman
  • 9,464
  • 7
  • 35
  • 58
  • Combining regex and string operations is not a bad idea, although I don't think this particular regex matches what OP wanted. – matszwecja Aug 10 '23 at 13:57
  • Combining string operations was a great idea! Worked a charm for my pattern. Thanks Freeman! – SJP Aug 10 '23 at 15:20
  • @SJP Please kindly consider selecting my solution as the chosen answer if it resolves your issue, so that the topic can be closed. If you are still facing any difficulties, please let me know, and I would be happy to provide further assistance. Thank you kindly. – Freeman Aug 10 '23 at 17:58
1

Here's a stupid pure regex approach that matches all 12 times, with "the whole match" stored in group 2:

\(                       # Match a '('
(?=                      # then lookahead to
  (?:\.*\(){24}          # the 25th bracket
  (.*)                   # and capture anything after that until the end.
)                        # 
(?<=                     # Take a step back behind the first '('
  (?=                    # then assure that there are
    (                    # 
      \(*                # 
      (?:\.\(*){0,15}    # no more than 15 dots from there
    )                    # until
    \1$                  # the group 1 we captured.
  )                      # 
  \(                     # 
)                        # 

Try it on regex101.com.

The main idea is to use the last bracket in the series as the limit, then check the number of dots between those two. The lookbehind is actually not needed for verifying, only for capturing.

The rest is the job of .finditer():

for match_no, match in enumerate(regex.finditer(text), 1):
  print(f'{match_no = }')
  print(f'index = {match.start(0)}')
  print(f'{match[2] = !r}\n')

Try it:

match_no = 1
index = 3
match[2] = '((((((((.(((..((..((((.(((((((.('

...

match_no = 12
index = 17
match[2] = '((..((((.(((((((.(..(((((.(((.((('
InSync
  • 4,851
  • 4
  • 8
  • 30
  • Wow, I had a hard time understanding this (but I believe it helped improved my regex understanding); very nice! If I understand correctly, the last `\(` is to go forward for the next match? – Swifty Aug 11 '23 at 07:59
  • @Swifty That last `\(` is also the first `\(`. Since we don't have the luxury of consuming all those characters in group 0, we need to match the first `(`, and then "take a step back" to place the lookahead *behind* that `(`: `(?<=(?=)\()`. We [can do that directly](https://regex101.com/r/I8mYaH/2) if we place `\(` after those two lookaheads though. – InSync Aug 11 '23 at 09:10
  • Actually the 2nd regex is more clear to me than the 1st one (I'm still a little fuzzy on that lookbehind, but I'll work on that); anyway what I meant is that I understand the last `\(` "consumes" the first opening parenthesis (or rather advances the cursor). Is that right? – Swifty Aug 11 '23 at 09:20
  • @Swifty Actually, it moves the "pointer" 1 character back to the left, since we're inside a lookbehind (which consumes nothing). – InSync Aug 11 '23 at 09:22
  • Thanks! I now realize I was horribly mixing all those lookarounds in my mind; after some more reading and tests, I better understand them (and your "nested" lookarounds), allong with the potency of fitting capture groups in lookarounds. – Swifty Aug 11 '23 at 10:07