0

This is part of a larger regex, and the intention is for the entire string labeled test to match and fall into the capture group (with the exception of the first and last three parentheses).

As written, my understanding is that the regex should capture a string between one opening parentheses ( and three closing parentheses ))).

Regex:\(([^\)\)\)]*)\)\)\)[\s]*,?

Test:((Test_1, (3.7, 88, test,, str)), (Test_2, (1.9, 33, test,, str))) ,

When used with Python's standard regex library, only (Test_2, (1.9, 33, test,, str))) , is actually matching the regex instead of the entire string. I must be missing something here, but I'm having a hard time figuring out what that is and how to resolve it.

test=r"((Test_1, (3.7, 88, test,, str)), (Test_2, (1.9, 33, test,, str))) ,"
re.compile(r"\(([^\)\)\)]*)\)\)\)[\s]*,?").search(test).group(0)
>>> '(Test_2, (1.9, 33, test,, str))) ,'
alomeli
  • 49
  • 9
  • 3
    `\)\)\)` needs to match 3 `)`s. `[^\)\)\)]*` matches any 0 or more chars other than `)`. That is why. Your regex is equal to `\(([^)]*)\){3}\s*,?` – Wiktor Stribiżew Jan 24 '20 at 22:00
  • Beside the point, but group 0 is the entire match. I think you want group 1. – wjandrea Jan 24 '20 at 22:07
  • @wjandrea, thanks - you're correct, but for the example I was showing the match, not the capture (I use the capture elsewhere in code, but thought the match itself was more relevant to the question). – alomeli Jan 24 '20 at 22:10
  • 1
    The tag info for the regex tag has a list of common gotchas, including the one here (namely, that `[aaa]` is equivalent to `[a]`, not `aaa`); it's worth a read to help demystify some of what's going on. – manveti Jan 24 '20 at 22:12

3 Answers3

1

Your regex requires 3 closing brackets. Your example has only 2 ) in the first part, so only second one is encountered.

See https://regex101.com/r/eTEja1/1

UPD:

If you want to capture the whole string, you should use this pattern:

\(([\s\S]*?)\){3}[\s]*,?

  • [\s\S] means that any symbol will be captured (. also works)

  • *? make it not greedy, preventing for capturing the entire text up to the last ))). It will capture the smallest possible chunk instead.

See https://regex101.com/r/DoRnjW/3

Heavy
  • 1,861
  • 14
  • 25
  • Why wouldn't the first part still be captured along with the second? The regex specifies a group that starts with one bracket and ends with three. Since the first part doesn't end with three but starts with one, what prevents it from being included in the match? – alomeli Jan 24 '20 at 22:11
  • @alomeli I told you, `[^))))))))))))]` is equal to `[^)]` – Wiktor Stribiżew Jan 24 '20 at 22:14
  • OP is expecting to match the whole string, including both parts – wjandrea Jan 24 '20 at 22:15
  • 1
    Because `[^\)\)\)]` prevents closing brackets to be inside the match. Also this doesn't make sense because it could be replaced with just `[^\)]`. It's not clear what you wanted to achieve here. – Heavy Jan 24 '20 at 22:15
  • 1
    Quite on the contrary, it is clear. OP wants `\(((?:(?!\){3}).)*?)\){3}\s*,?`, [demo](https://regex101.com/r/HY6pVv/1). But `[\s\S]*?`, or `.*?`,will be enough. – Wiktor Stribiżew Jan 24 '20 at 22:23
  • Why use `[\s\S]` instead of `.`? – wjandrea Jan 24 '20 at 22:34
  • `.` doesn't match line breaks by default. For current example it should be the same, but I always use `[\s\S]` just in case. See also https://stackoverflow.com/questions/44246215/is-s-s-same-as-dot – Heavy Jan 24 '20 at 22:39
0

Regex: \((.+)\)\)\)

Output: (Test_1, (3.7, 88, test,, str)), (Test_2, (1.9, 33, test,, str

It removes the first and the last 3 parenthesis, as you specified. But I believe you may also find this one useful:

Regex: \((.+\)\))\)

Output: (Test_1, (3.7, 88, test,, str)), (Test_2, (1.9, 33, test,, str))

It removes the first parenthesis and the last of the three last parenthesis. Regex will always look for the largest match, so there is no need to specify a not parenthesis if the parenthesis is not the last one.

rvbarreto
  • 683
  • 9
  • 24
0

[^\)\)\)] is equivalent to [^)], therefore the match cannot contain a closing parenthesis. Simply replace it with .*.

Also you can simplify \)\)\) to \){3} and [\s] to \s.

import re

test = r"((Test_1, (3.7, 88, test,, str)), (Test_2, (1.9, 33, test,, str))) ,"
pattern = r"\((.*)\){3}\s*,?"
regex = re.compile(pattern)
m = regex.match(test)

print(m.group(0) == test)
print(m.group(0))
print(m.group(1))

Output:

True
((Test_1, (3.7, 88, test,, str)), (Test_2, (1.9, 33, test,, str))) ,
(Test_1, (3.7, 88, test,, str)), (Test_2, (1.9, 33, test,, str

Now, .* is greedy, so it could include a string that matches \){3}\s*,? if there is a second occurrence in test. You could avoid that by making it non-greedy: .*?

wjandrea
  • 28,235
  • 9
  • 60
  • 81