Regex With Lookahead For Fixed Length String

Question

strings = [
    r"C:\Photos\Selfies\1|",
    r"C:\HDPhotos\Landscapes\2|",
    r"C:\Filters\Pics\12345678|",
    r"C:\Filters\Pics2\00000000|",
    r"C:\Filters\Pics2\00000000|XAV7"
    ]
    
for string in strings:
    matchptrn = re.match(r"(?P<file_path>.*)(?!\d{8})", string)
    if matchptrn:
        print("FILE PATH = "+matchptrn.group('file_path'))

I am trying to get this regular expression with a lookahead to work the way I though it would. Examples of Look Aheads on most websites seem to be pretty basic string matches i.e. not matching 'bar' if it is preceded by a 'foo' as an example of a negative look behind.

My goal is to capture in the group file_path the actual file path only if the string does NOT have an 8 character length number in it just before the pipe symbol | and match anything after the pipe symbol in another group (something I haven't implemented here).

So in the above example it should match only the first two strings

C:\Photos\Selfies\1
C:\HDPhotos\Landscapes\2

In case of the last string

C:\Filters\Pics2\00000000|XAV7

I'd like to match C:\Filters\Pics2\00000000 in <file_path> and match XAV7in another group named .
(This is something I can figure out on my own if I get some help with the negative look ahead)

Currently <file_path> matches everything, which makes sense since it is non-greedy (.*) I want it to only capture if the last part of the string before the pipe symbol is NOT an 8 length character.

OUTPUT OF CODE SNIPPET PASTED BELOW

FILE PATH = C:\Photos\Selfies\1|
FILE PATH = C:\HDPhotos\Landscapes\2|
FILE PATH = C:\Filters\Pics\12345678|
FILE PATH = C:\Filters\Pics2\00000000|
FILE PATH = C:\Filters\Pics2\00000000|XAV7

Making this modification of \\

matchptrn = re.match(r"(?P<file_path>.*)\\(?!\d{8})", string)
if matchptrn:
    print("FILE PATH = "+matchptrn.group('file_path'))

makes things worse as the output is

FILE PATH = C:\Photos\Selfies
FILE PATH = C:\HDPhotos\Landscapes
FILE PATH = C:\Filters
FILE PATH = C:\Filters
FILE PATH = C:\Filters

Can someone please explain this as well ?

Does [Regex lookahead, lookbehind and atomic groups](https://stackoverflow.com/questions/2973436/regex-lookahead-lookbehind-and-atomic-groups?rq=1) give you the answer you are looking for — itprorh66, Dec 06 '20 at 16:38
@itprorh66 unfortunately, no, I went though the answer but I guess I'm not smart enough to figure out what's wrong, I've tested several combinations through trial and error but none of them seem to work. — Dhiwakar Ravikumar, Dec 06 '20 at 16:45
I think I managed to figure this one out after reading a bit more --> (?![\\\:\w\d]+\\\d{8}\|.*)(?P^.*) I am still testing it so I could be dead wrong. — Dhiwakar Ravikumar, Dec 06 '20 at 17:43

score 1 · Accepted Answer · answered Dec 06 '20 at 18:34

You can use

^(?!.*\\\d{8}\|$)(?P<file_path>.*)\|(?P<suffix>.*)

See the regex demo.

Details

^ - start of a string
(?!.*\\\d{8}\|$) - fail the match if the string contains \ followed with eight digits and then | at the end of string
(?P<file_path>.*) - Group "file_path": any zero or more chars other than line break chars as many as possible
\| - a pipe
(?P<suffix>.*) - Group "sfuffix": the rest of the string, any zero or more chars other than line break chars, as many as possible.

See the Python demo:

import re
strings = [
    r"C:\Photos\Selfies\1|",
    r"C:\HDPhotos\Landscapes\2|",
    r"C:\Filters\Pics\12345678|",
    r"C:\Filters\Pics2\00000000|",
    r"C:\Filters\Pics2\00000000|XAV7"
    ]
    
for string in strings:
    matchptrn = re.match(r"(?!.*\\\d{8}\|$)(?P<file_path>.*)\|(?P<suffix>.*)", string)
    if matchptrn:
        print("FILE PATH = {}, SUFFIX = {}".format(*matchptrn.groups()))

Output:

FILE PATH = C:\Photos\Selfies\1, SUFFIX = 
FILE PATH = C:\HDPhotos\Landscapes\2, SUFFIX = 
FILE PATH = C:\Filters\Pics2\00000000, SUFFIX = XAV7

Regex With Lookahead For Fixed Length String

1 Answers1