As a learning exercise, I like to compare two regular expressions doing the same thing.
In this case, I want to extract the sequences of numbers from strings like this:
CC_nums=[
'2341-3421-5632-0981-009',
'521-9085-3948-2543-89-9'
]
And the correct result after capturing in a regex will be
['2341', '3421', '5632', '0981', '009']
['4521', '9085', '3948', '2543', '89', '9']
I understand that this works in python
:
for number in CC_nums:
print re.findall('(\d+)',number)
But, to understand this more deeply, I tried the following:
for number in CC_nums:
print re.findall('\s*(?:(\d+)\D+)+(\d+)\s*', number)
..which returns:
[('0981', '009')]
[('89', '9')]
Two questions:
Firstly, why does the second one return a tuple instead of a list?
Secondly, why does the second one not match the other sets of digits, like 2341
, 3241
, etc.?
I know that findall
will return non-overlapping capturing groups, so I tried to avoid this. The capturing groups are non-overlapping because of the (\d+)
, so I thought that this would not be an issue.