Given a dataset as follows:
id vector_name
0 1 01,02,03,04
1 2 001,002,003
2 3 01,02,03
3 4 A, B, C
4 5 s01, s02, s02
5 6 E2702-2703,E2702-2703
6 7 03,05,06
7 8 05-08,09,10-12, 05-08
How could I write a regex to filter out the string rows in column vector_name
which are not composed by two digits values: the correct format should be 01, 02, 03, ...
etc. Otherwise, returns invalid vector name
for check
column.
The expected result will be like this:
id vector_name
0 1 01,02,03,04
1 2 invalid vector name
2 3 01,02,03
3 4 invalid vector name
4 5 invalid vector name
5 6 invalid vector name
6 7 03,05,06
7 8 05-08,09,10-12, 05-08
The pattern I used: (\d+)(,\s*\d+)*
, but it consider 001,002,003
as valid.
How could I do that? Thanks.