I have a tough time figuring out a regular expression (something I have sadly almost not experience with) for the following problem:
- text starting with a given prefix (let's say it's
ab4
) - text has a body of 4 blocks of 4 characters (that's what the
4
inab4
stands for) each of which can be an ASCII alpha-numeric, whitespace, brackets, hyphen or a dot (basicallya-zA-Z0-9 ()-.
). Example:abcd
,.b a
,,
b(a.)
are all valid single blocks. - text body can be empty (
ab4
is the only content) or contain up to the four blocks (ab4xxxx
,ab4xxxxxxxx
,ab4xxxxxxxxxxxx
,ab4xxxxxxxxxxxxxxxx
withx
being a valid character) - text end with a CR (carriage return -
\r\n
). The ending is counted as a terminating character and is NOT part of the body
So far I have come up with
.*ab4([a-zA-Z0-9 ()-.]{4}){1,4}\\r\\n.*
I use regular expressions 101 to verify my regex before I add it to my C++ code. However if I input
ab4aaa bbb ccc ddd \r\n
I get the following stats:
Full match:
0-25 'ab4aaa bbb ccc ddd \r\n'
Group 1.:
15-19 'ddd '
The regex verifier tells me that
A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
but frankly I have no idea what this means. I tried (([a-zA-Z0-9 ()-.]{4}){1,4})
which didn't change much.
I'm looking for a better grouping namely one that sets the 4 blocks apart as separate groups. For the example above I'm expecting
Full match:
0-25 'ab4aaa bbb ccc ddd \r\n'
- Group 1.:
0-3 'aaa '
- Group 1.:
4-7 'bbb '
- Group 3.:
8-11 'ccc '
- Group 4.:
12-15 'ddd '