0

My C# .NET program uses:

DirInfo.EnumerateFiles(Program.sSourceFilePattern, SearchOption.TopDirectoryOnly)

to search a folder for filenames matching 'sSourceFilePattern'. This search pattern is user supplied and I want to validate the pattern before executing the DirInfo.

I found a regex pattern at How do I check if a given string is a legal / valid file name under Windows? that I lifted and modified to permit wildcard characters * and ?:

sPattern = @"^(?!^(PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(\..+)?$)[^\x00-\x1f\\:\"";|/]+$";

This pattern works fairly well, but will still permit nonsensical patterns involving multiple wildcard characters. For example, this will permit invalid search strings like:

abc*123.txt
abc*???.txt
*abc.txt

I think that refining this further will involve more than regexs, because it requires applying logic about where the asterisks may occur and what may follow them, whether it's before of after the period (separator), etc.

Nevertheless, I would appreciate any suggestions for improving this regex to catch more of the common errors. Thanks in advance!

Community
  • 1
  • 1
KMorley
  • 1
  • 3
  • 2
    Why would your examples be invalid search strings ? 1 and 3 make sense , and 2 could be reduced to `abc*.txt` but is still valid from where I see it. – Laurent S. Oct 17 '14 at 12:50
  • 3
    what exactly makes an invalid search string, in your case? Is it a directory that the application may not search in? – Nzall Oct 17 '14 at 12:55
  • Thanks for the comments and your points are both well taken. My goal was to validate a search string as syntactically correct, rather than just validate those that don't bomb the program. Fortunately, both the Windows command prompt and DirInfo.EnumerateFiles are very tolerant. If an invalid search pattern is supplied, they just don't return any matches. – KMorley Oct 17 '14 at 19:54
  • Windows search patterns use asterisks and question marks as wild cards, where the question mark matches any single character and the asterisk matches through end of string. Question marks can appear anywhere in the search pattern but asterisks are more limited. A syntactically correct search pattern has 0-2 asterisks. If the pattern has a single asterisk, it should be the last char of the filename (just before the period separator) or the last char of the filename extension. If two asterisks, one should be the last char of the filename and the second is the last char of the extension. – KMorley Oct 17 '14 at 20:04

1 Answers1

0

I decided that the asterisk wild card rules were too complex for any regex that I could design and decided to just handle it with logic. It turned out to be simpler to do than I originally expected:

if (bResult = Regex.IsMatch(sResult, sPattern, RegexOptions.CultureInvariant))
{
    // Reuse bResult and preset to false.  Only passing all tests sets to true:
    bResult = false;

    // True - no reserved words or illegal characters, so test further.
    // Check wild card placement. '?' may appear anywhere, but '*' follows specific rules.
    // Use LINQ to count occurences of asterisk.  Zero to two is acceptable:
    iCount = sResult.Count(f => f == '*');

    if (iCount == 0)
    {
        // No asterisks, so search pattern testing is finished and the pattern is good.
        bResult = true;
    }
    else if (iCount == 1)
    {
        // One asterisk, so test further.  If one asterisk, it must be last character in string:
        if (sResult.Length == sResult.IndexOf("*")+1)
        {
            // One asterisk and it IS the last character.
            bResult = true;
        }
    }
    else if (iCount == 2)
    {
        // Two asterisks, so test further.  The first asterisk can ONLY be followed
        // by period.  The second asterisk must be the last character in the string:
        iIdx = sResult.IndexOf("*");
        if (sResult.Substring(iIdx+1,1) == ".")
        {
            // First asterisk is followed by period, so test further:
            if (sResult.Length == sResult.LastIndexOf("*")+1)
            {
                // Second asterisk is the last character, so good search pattern.
                bResult = true;
            }
        }
    }
}
KMorley
  • 1
  • 3