How to find pattern match in text with exceptions if match occurs within a substring?

Question

I want to determine if there exists any occurrences of strings (from a list of rejectable strings) in some text, but only if that string isn't found within a larger allowable string in the text where it was found (from a list of allowable strings).

Simple example:

Text: "The quick red fox jumped over the lazy brown dog in front of the farmer."

rejectableStrings: "fox", "dog", "farmer"

allowableStrings: "quick red fox", "smurfy blue fox", "lazy brown dog", "old green farmer"

So, raise flag if any of each of the strings "fox", "dog", or "farmer" are found in the text but not if that string found is contained within any of the allowable strings (at/around the same location within text where the rejection was found).

Example logic not yet complete:

string status = "allowable";
foreach (string rejectableString in rejectableStrings)
{
  // check if rejectableString is found as a whole word with either a space or start/end of string surrounding the flag
  // https://stackoverflow.com/a/16213482/56082
  string invalidValuePattern = string.Format(@"(?<!\S){0}(?!\S)", rejectableString);
  if (Regex.IsMatch(text, invalidValuePattern, RegexOptions.IgnoreCase))
  {
    // it is found so we initially raise the flag to check further
    status = "flagged";
    foreach (string allowableString in allowableStrings)
    {
      // only need to consider allowableString if it contains the rejectableString, otherwise ignore
      if (allowableString.Contains(rejectableString)) 
      {
        // check if the found occurence of the rejectableString in text is actually contained within a relevant allowableString, 

        // *** the area that needs attention *** 
        if ('rejectableString occurence found in text is also contained within the same substring allowableString of text')
        {
          // this occurrence of rejectableString is actually allowable, change status back to allowable and break out of the allowable foreach
          status = "allowable";
          break;
        } 
      }
    }
    if (status.Equals("flagged")) 
    {
      throw new Exception(rejectableString.ToUpper() + " found in text is not allowed.");
    }
  }
}

Background if interested: This is for an SQL query validation method for an app where the goal is to reject queries that contain permanent database modification commands, but allow the query to be considered valid if the invalid command found is actually a substring of a temporary table command or some other logical exception that should allow the command within the query. This is a multi-database query validation, not specific to a single database product.

So the real world examples for rejectable and allowable are

private string[] rejectableStrings = {"insert","update","set","alter",
   "create","delete"};
private string[] allowableStrings = { "insert into #", "create table #",
   "create global temporary table ", "create temporary tablespace ", "offset "};

and the text would be an sql query.

score 3 · Accepted Answer · edited Dec 31 '18 at 21:56

You can do this by first removing all the acceptable words and then checking for any unallowed ones. This ensures that when you look for the unallowed words that you aren't looking at any words that are allowed.

public static void Main(string[] args)
{
   string[] rejectableStrings = new string[] {"fox", "dog", "farmer"};
   string[] allowableStrings = new string[] {"quick red fox", "smurfy blue fox", 
                                             "lazy brown dog", "old green farmer"};
   string teststr = "fox quick red fox";
   bool pass = true;
   foreach (string allowed in allowableStrings)
   {
      teststr = Regex.Replace(teststr, allowed, "", RegexOptions.IgnoreCase);
   }

   foreach (string reject in rejectableStrings)
   {
      if (Regex.Matches(teststr, reject, RegexOptions.IgnoreCase).Count > 0) {
         pass = false;
     }
   }
   Console.WriteLine(pass);
}

Try it Online

Nice! - excellent approach. I also like the online tester tool as well. I think still needs adjustment for case insensitivity in the replacement and matches. — Streamline, Dec 31 '18 at 21:24

How to find pattern match in text with exceptions if match occurs within a substring?

1 Answers1