Regex pattern: (,)|(\([^()]{0,20}\))
Intuition behind this pattern:
(,)
looks for all commas. These are stored in capturing group 1.
(\([^()]{0,20}\))
looks for all parentheses with at most 20 characters in between. These are stored in capturing group 2.
We can then find all matches from group 1 only to exclude those commas within parentheses of length 20.
Now to find the indices for these matches, use re.finditer() combined with Match.start() and Match.group() to find the starting index for each match from group 1:
import re
string = """Get index of this comma, but (not this , comma). Get other commas like , or ,or, 1,1 2 ,2.
(not this ,) BUT (get index of this comma, if more than 20 characters are inside the parentheses)"""
indices = [m.start(1) for m in re.finditer('(,)|(\([^()]{0,20}\))', string) if m.group(1)]
print(indices)
# > [23, 71, 76, 79, 82, 87, 132]
print([string[index] for index in indices])
# > [',', ',', ',', ',', ',', ',', ',']
m.start(1)
returns the starting index for group 1 matches. Since re.finditer()
returns matches from all capturing groups, adding if m.group(1)
requires that a match is found for group 1 (matches from other groups are None
).
Edit: This ignores parentheses with 20 or fewer characters inside, which is not consistent with your first statement but is consistent with what the example explains. If you want less than 20, just use {0,19}
.