Used Python 2.7 to reproduce. This answer shows the issue with not found backreferences for re.sub
in Python 2.7 and some patterns to fix.
Both patterns compile
import re
# both seem identical
regex1 = '^(.+?)(\d+-\d+)?$'
regex2 = '^(.+?)(\d+-\d+)?$'
# also the compiled pattern is identical, see hash
re.compile(regex1) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
re.compile(regex2) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
Note: The compiled pattern using re.compile()
saves time when re-using multiple times like in this loop.
Fix: test for groups found
The error-message indicates that there are groups that aren't matched.
Put it other: In the matching result of re.sub
(docs to 2.7) there are references to groups like the second capturing group (\2
) that have not been found or captured in the given string input:
sre_constants.error: unmatched group
To fix this, we should test on groups that were found in the match.
Therefore we use re.match(regex, str)
or the compiled variant pattern.match(str)
to create a Match
object, then Match.groups()
to return all found groups as tuple.
import re
regex = '^(.+?)(\d+-\d+)?$' # a key followed by optional digits-range
pattern = re.compile(regex) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
def dict_with_expanded_digits(fields_list):
entry_list = []
for fields in fields_list:
(key_digits_range, value) = fields.split() # a pair of ('key0-1', 'value')
# test for match and groups found
match = pattern.match(key_digits_range)
print("DEBUG: groups:", match.groups()) # tuple containing all the subgroups of the match,
# watch: the 3rd iteration has only group(1), while group(2) is None
# break to next iteration here, if not maching pattern
if not match:
print('ERROR: no valid key! Will not add to dict.', fields)
continue
# if no 2nd group, only a single key,value
if not match.group(2):
print('WARN: key without range! Will add as single entry:', fields)
entry_list.append( (key_digits_range, value) )
continue # stop iteration here and continue with next
key = pattern.sub(r'\1', key_digits_range)
index_range = pattern.sub(r'\2', key_digits_range)
# no strip needed here
(start, end) = index_range.split('-')
for index in range(int(start), int(end)+1):
expanded_key = "{}{}".format(key, index)
entry = (expanded_key, value) # use tuple for each field entry (key, value)
entry_list.append(entry)
return dict([e for e in entry_list])
list_a = [
'abcd1-2 4d4e', # 2 entries
'xyz0-1 551', # 2 entries
'foo 3ea', # 1 entry
'bar1 2bd', # 1 entry
'mc-mqisd0-2 77a' # 3 entries
]
dict_a = dict_with_expanded_digits(list_a)
print("INFO: resulting dict with length: ", len(dict_a), dict_a)
assert len(dict_a) == 9
Prints:
('DEBUG: groups:', ('abcd', '1-2'))
('DEBUG: groups:', ('xyz', '0-1'))
('DEBUG: groups:', ('foo', None))
('WARN: key without range! Will add as single entry:', 'foo 3ea')
('DEBUG: groups:', ('bar1', None))
('WARN: key without range! Will add as single entry:', 'bar1 2bd')
('DEBUG: groups:', ('mc-mqisd', '0-2'))
('INFO: resulting dict with length: ', 9, {'bar1': '2bd', 'foo': '3ea', 'mc-mqisd2': '77a', 'mc-mqisd0': '77a', 'mc-mqisd1': '77a', 'xyz1': '551', 'xyz0': '551', 'abcd1': '4d4e', 'abcd2': '4d4e'})
Note on added improvements
- renamed function and variables to express intend
- used tuples where possible, e.g. assignment
(start, end)
- instead of
re.
methods used the equivalent methods of compiled pattern pattern.
- the guard-statement
if not match.group(2):
avoids expanding the field and just adds the key-value as is
- added
assert
to verify given list of 7 is expanded to dict of 9 as expected