strings = [
r"C:\Photos\Selfies\1|",
r"C:\HDPhotos\Landscapes\2|",
r"C:\Filters\Pics\12345678|",
r"C:\Filters\Pics2\00000000|",
r"C:\Filters\Pics2\00000000|XAV7"
]
for string in strings:
matchptrn = re.match(r"(?P<file_path>.*)(?!\d{8})", string)
if matchptrn:
print("FILE PATH = "+matchptrn.group('file_path'))
I am trying to get this regular expression with a lookahead to work the way I though it would. Examples of Look Aheads on most websites seem to be pretty basic string matches i.e. not matching 'bar' if it is preceded by a 'foo' as an example of a negative look behind.
My goal is to capture in the group file_path
the actual file path only if the string does NOT have an 8 character length number in it just before the pipe symbol |
and match anything after the pipe symbol in another group (something I haven't implemented here).
So in the above example it should match only the first two strings
C:\Photos\Selfies\1
C:\HDPhotos\Landscapes\2
In case of the last string
C:\Filters\Pics2\00000000|XAV7
I'd like to match C:\Filters\Pics2\00000000
in <file_path>
and match XAV7
in another group named .
(This is something I can figure out on my own if I get some help with the negative look ahead)
Currently <file_path> matches everything, which makes sense since it is non-greedy (.*) I want it to only capture if the last part of the string before the pipe symbol is NOT an 8 length character.
OUTPUT OF CODE SNIPPET PASTED BELOW
FILE PATH = C:\Photos\Selfies\1|
FILE PATH = C:\HDPhotos\Landscapes\2|
FILE PATH = C:\Filters\Pics\12345678|
FILE PATH = C:\Filters\Pics2\00000000|
FILE PATH = C:\Filters\Pics2\00000000|XAV7
Making this modification of \\
matchptrn = re.match(r"(?P<file_path>.*)\\(?!\d{8})", string)
if matchptrn:
print("FILE PATH = "+matchptrn.group('file_path'))
makes things worse as the output is
FILE PATH = C:\Photos\Selfies
FILE PATH = C:\HDPhotos\Landscapes
FILE PATH = C:\Filters
FILE PATH = C:\Filters
FILE PATH = C:\Filters
Can someone please explain this as well ?