2

In a text containing lines with full paths, I need to match only lines whose file name doesn't start with the word 'TMP' (case insensitive).

In the next sample list, lines marked with "EXCLUDE" shouldn't be matched.

c:\folder1\TMP_file.ext----------EXCLUDE
c:\TMP_folder1\file.ext
c:\folder1\TMP_folder2\file.ext
c:\folder1/TMP_file.ext----------EXCLUDE
c:\file.ext
c:\TMP_file.ext------------------EXCLUDE
TMP_file.ext---------------------EXCLUDE
file.ext

I came up with the simple expression [^\\/\r\n]+$ (accepting '\' and '/' as directory separators) that successfully matches whole file names with their extensions, but I can't figure out how to add (?!...) to exclude the matches that start with 'tmp'.

Inverting the expression tmp[^\\/\r\n]+$ would be also the solution, but I don't know how.

I know this question is similar to others (taking the risk of a downvote...) but I didn't found a way to connect them with this problem.

oscar
  • 355
  • 1
  • 2
  • 13
  • https://regex101.com/r/b2KfB8/1 – Wiktor Stribiżew Sep 01 '20 at 18:53
  • @WiktorStribiżew Your answer is nearly right, but it works only in Case Sensitive mode thanks to a fortunate "tmp" and "TMP" combination that slipped in the original question. I have corrected the question so that all "TMP" share the same case, and now your RegEx fails because of the "trap" folders that makes this problem a bit hard. – oscar Sep 02 '20 at 06:10
  • Not a big problem, https://regex101.com/r/b2KfB8/2 – Wiktor Stribiżew Sep 02 '20 at 08:12
  • @WiktorStribiżew Your expression ^(?!(?:.*[/\\])?TMP(?![^\W_])[^/\n]*$).+ works! Could you please add it as answer so that I can accept it as final answer? – oscar Sep 02 '20 at 10:30

2 Answers2

1

Regex is not the right solution here. You better iterate over file names, takes the base path, and skip if it startswith 'TMP'.

def filter_tmp(text):
    paths = text.split('\n')
    for p in paths:
        if not os.path.basename(p).startswith('TMP'):
            yield p

Then list(filter_tmp(text)) would give you the list of non-temp paths.

Elazar
  • 20,415
  • 4
  • 46
  • 67
  • Does it mean that RegEx can't recursively find new matches inside previously found ones? I.e., if I successfully match file name in 'c:\folder1\TMP_file.ext', can't I run a new match expression ('TMP') inside that successful match? Could (?R) help? – oscar Sep 02 '20 at 06:27
  • No, I claim nothing here about what's impossible; the claim is about what is recommended and readable. – Elazar Sep 02 '20 at 10:42
1

You can use

(?i)^(?!(?:.*[/\\])?TMP(?![^\W_])[^/]*$).+

See the regex demo ([^/] is replaced with [^/\n] since the regex is tested against a single multiline string).

Details

  • ^ - start of string
  • (?!(?:.*[/\\])?TMP(?![^\W_])[^/]*$) - a negative lookahead that fails the match if, immediately to the right of the current location, there is
    • (?:.*[/\\])? - an optional occurrence of any 0+ chars other than line break chars as many as possible and then / or \
    • TMP(?![^\W_]) - TMP (case insensitive) not followed with a letter or digit (can be followed with _)
    • [^/]* - any 0 or more chars other than /
    • $ - end of string.
  • .+ - one or more chars other than line break chars.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563