-1

I'm trying to find if at least one file within a directory matches patterns (that use only "?" and "*" wildcards), but some combinations keep throwing the nested qualifier error. For example - TestCashFile_10_12-25-2016????????.c?? doesn't work.

The patterns come from the non-technical users (who're educated in the basic usage of these two wildcards) so the "?" and "*" can go pretty much anywhere in the filename and I don't have much control.

What is wrong with these patterns?

This is the C# code snippet that runs this regex -

string fileName = C:\TestFiles\TestCashFile_10_12-25-2016????????.c??'
string directory = Path.GetDirectoryName(fileName);
string[] temp = fileName.Split('\\');
string file = temp[temp.Length - 1];
var found = Directory.GetFiles(directory).Any(p => Regex.Match(p, file).Success);

Update - The question has been resolved but in case it helps someone else looking for something similar, just to clarify - In this case, I wanted "?" to mean that there must be exactly one element (as opposed to zero or one element).

Achilles
  • 1,099
  • 12
  • 29
  • See [Need to perform Wildcard (*,?, etc) search on a string using Regex](http://stackoverflow.com/questions/6907720/need-to-perform-wildcard-etc-search-on-a-string-using-regex). – Wiktor Stribiżew Jan 10 '17 at 16:44
  • 3
    [`Directory.GetFiles()` supports wildcards already](https://msdn.microsoft.com/en-us/library/wz42302f%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396). Is there something I'm missing? You seem to be using filesystem wildcards as if they were a regular expression. They aren't. That can't possibly work. – 15ee8f99-57ff-4f92-890c-b56153 Jan 10 '17 at 16:46
  • 1
    File wildcards and regular expressions are not the same thing. Anyway, what @EdPlunkett said shoud be carefully considered before you attempt to reinvent the wheel. – spender Jan 10 '17 at 16:49
  • @EdPlunkett , @spender I just tried `Directory.GetFiles(directory, file)`, it seems to work more like `C:\TestFiles\TestCashFile_10_12-25-2016*.c*` – Achilles Jan 10 '17 at 16:53
  • @Achilles "Seems to". "More like". Call me Horatio, but I'm not familiar with any filesystem where wildcard matching is ambiguous or subjective. – 15ee8f99-57ff-4f92-890c-b56153 Jan 10 '17 at 16:55
  • 1
    @EdPlunkett I just tried it what you suggested and it simply doesn't work. TestCashFile_10_12-25-2016????????.c?? with your suggestion returns files "TestCashFile_10_12-25-2016.csv", "TestCashFile_10_12-25-2016_B.csv" etc. – Achilles Jan 10 '17 at 16:59
  • @Achilles Yes, those would be valid matching for the wildcard pattern that you've specified. If you want the semantics to not match Regex behavior, or filesystem matching, but rather something else entirely, then, at a minimum, you're going to need to define what you want, and you're also going to need to write the code to perform that matching yourself. – Servy Jan 10 '17 at 17:01
  • @Achilles Thank you, now I understand what you mean. – 15ee8f99-57ff-4f92-890c-b56153 Jan 10 '17 at 17:04

2 Answers2

1

The ? operator specifies that the previous element can occur 0 or 1 time.

https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx

? Matches the previous element zero or one time. "rai?n" "ran", "rain"

If you use the wildcards built into Directory.GetFiles like @Ed Plunkett said, it should work similar to what you are looking for.

If you still want to use your current method with RegEx, do something like the following:

  • .* - any number of characters
  • .{n} - replace n with the number of expected characters
  • .{m,n} - replace m with the min number of expected characters, and n with the max number of expected characters.
ps2goat
  • 8,067
  • 1
  • 35
  • 68
  • Thank you. My mistake, I had imagined that `?` will ensure that exactly one element must be present for each `?`. I just tried to change it to TestCashFile_10_12-25-2016{2}.c{2} though and it doesn't return any results either. There are files with names "TestCashFile_10_12-25-2016_D.csv" and "TestCashFile_10_12-25-2016_D.csv" etc present in the folder. – Achilles Jan 10 '17 at 17:10
  • you need the `.` (period/dot), which indicates any character in this example. so try `TestCashFile_10_12-25-2016.{2}\\.c.{2}`. you would also need to add the escape slashes on the file extension's `.` to treat it as the actual character, though that brings issues of its own since you are splitting on \\ already... – ps2goat Jan 10 '17 at 18:54
1

If you need "??" to match exactly two of any character then you're right, you'll have to use regexes. Filesystem wildcarding treats "?" as "zero or one of any character".

But you can't do it the way you tried to, because you're asking your users for filesystem wildcards -- you're just altering the semantics a bit. You'll have to turn the strings from the user into the regex you want:

a???.*

Has to become

a.?.?.?\..*
  • Each question mark becomes ".": Exactly one of any character.
  • Each "." becomes ".", because unescaped "." is a special character in a regex.
  • Each "" has to become ".": Zero or more of any character (guessing on this one).

Do that to the file string, and your .Any(p => Regex.Match(p, file).Success); should work.

You might want to compile the regex though, if things get a little slow at runtime:

file = TranslateWildcardsToRegex(file);
var re = new Regex(file);

var found = Directory.GetFiles("").Any(p => re.IsMatch(p));

I think this is right for TranslateWildcardsToRegex():

public static String TranslateWildcardsToRegex(String s)
{
    StringBuilder sb = new StringBuilder();

    foreach (var ch in s)
    {
        switch (ch)
        {
            case '?':
                sb.Append(".");
                break;

            case '*':
                sb.Append(".*");
                break;

            //  Escape a variety of characters that 
            //  mean something special in a regex
            case '(':
            case ')':
            case '{':
            case '}':
            case '[':
            case ']':
            case '.':
                sb.Append("\\" + ch);
                break;

            default:
                sb.Append(ch);
                break;
        }
    }

    return sb.ToString();
}

UPDATE

In comments @spender offers a much nicer and cleaner way to do the same thing:

var reStr = Regex.Escape(someWildcardThing).Replace(@"\?", ".").Replace(@"\*", ".*")

I have no good excuse for not doing it that way myself, other than still being a recovering C programmer, after all these years.

  • Thank you. This works. The nested quantifier exceptions seems to have resolved itself as well with this change. – Achilles Jan 10 '17 at 17:15
  • It was a System.ArgumentException with a message "Nested Quantifier..." that my pattern in the question was throwing on Regex.Match. Seems to be a very common question on stackoverflow - http://stackoverflow.com/questions/210206/what-is-a-nested-quantifier-and-why-is-it-causing-my-regex-to-fail but I couldn't find a solution for my specific case. Good to know that fixing the regex fixed that as well. – Achilles Jan 10 '17 at 17:21
  • Ohhh, OK, I get it: By regex syntax rules, "a?*" is a whimsical thing to request: "Zero or more of zero or one of a". So for whatever reason they raise an exception. Now with the string manipulation converting the wildcards to property regex, that can't happen. "a?*" becomes "a..*", which is legal. That also reminds me though -- I need to make some more changes to that translate function. – 15ee8f99-57ff-4f92-890c-b56153 Jan 10 '17 at 17:27
  • @Achilles I just added an updated version of `TranslateWildcardsToRegex()`. It addresses some special characters that could break the old one. – 15ee8f99-57ff-4f92-890c-b56153 Jan 10 '17 at 17:30
  • 1
    @EdPlunkett I can help feeling that this could be significantly improved with the use of `Regex.Escape`. Couldn't this be done with something as simple as `Regex.Escape(someWildcardThing).Replace(@"\?", ".").Replace(@"\*", ".*")` ? – spender Jan 10 '17 at 23:08
  • @spender That would be much simpler and better code, you're right. – 15ee8f99-57ff-4f92-890c-b56153 Jan 10 '17 at 23:23