3

I am trying to create a regex filter to detect if a string contains a certain substring without certain words.

For example, I want to find all the strings that contain a substring "account manager" and exclude all managers, senior managers and senior account managers

I have tried to use the re.findall(r"account|manager") and then check if the length of the array is 2 and the array doesn't contain words senior or sr.

Instead of this, I would like to create an expression to (exclude words senior and sr) and (include words account and manager) so the condition should return True/False condition values for the following examples:

sr manager - False
senior key account manager - False
sr. key account manager - False
account manager - True
key account manager - True
manager - False
account manager - True

I tried to create something like the following, which is incorrect: (?!senior|sr)(key|account|manager)

Does anyone know what is the right way to check for such condition?

Niko Gamulin
  • 66,025
  • 95
  • 221
  • 286
  • Find the strings that contain "account manager." Find the strings that contain the words that you do not want. Take the set difference. – DYZ May 19 '21 at 20:05
  • this was just an example that might not illustrate the best the question I was asking - check for words, not necessarily subsequent and exclude certain words – Niko Gamulin May 19 '21 at 20:09
  • 1
    @NikoGamulin, you can try [`^(?:(?!sr\.|senior).)*account\s+manager`](https://regex101.com/r/zjNSKL/2) – Olvin Roght May 19 '21 at 20:16
  • 1
    You may try: `^(?!.*\b(?:sr\.?|senior)\s.*\bmanager\b).*\baccount\s+manager\b` – anubhava May 19 '21 at 20:18
  • 1
    @NikoGamulin I was typing up a long answer but unfortunately this was closed (why?!). My suggestion is to use the `regex` module in python which allows variable-width lookbehinds, and you can do something like: `(?<!((?:(?:senior)|(?:sr))).*)account manager`, link here: https://regex101.com/r/aKK8P4/1/, screenshot here: https://gyazo.com/af6f7420e4be098d898bb22d889b6521 – David542 May 19 '21 at 20:23
  • 1
    Closed because it is a dupe. This is a common question, match a string containing one pattern but not another. `^(?!.*(?:sr\.|senior)).*account\s+manager`. Or, `^(?!.*\b(?:sr\.|senior\b)).*\baccount\s+manager\b` – Wiktor Stribiżew May 19 '21 at 20:26

1 Answers1

2

Relying on regex for a simple task is generally a poor idea. Here's a simple, easy-to-read function that passes all your test cases.

from typing import List
def validate_str(s: str, target_substring: str, excluded_strs: List[str]) -> bool:
    if target_substring not in s:
        return False
    if any((i in s for i in excluded_strs)):
        return False
    return True
William Bradley
  • 355
  • 4
  • 10