66

I need to perform Wildcard (*, ?, etc.) search on a string. This is what I have done:

string input = "Message";
string pattern = "d*";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);

if (regex.IsMatch(input))
{
    MessageBox.Show("Found");
}
else
{
    MessageBox.Show("Not Found");
}

With the above code "Found" block is hitting but actually it should not!

If my pattern is "e*" then only "Found" should hit.

My understanding or requirement is d* search should find the text containing "d" followed by any characters.

Should I change my pattern as "d.*" and "e.*"? Is there any support in .NET for Wild Card which internally does it while using Regex class?

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Scott
  • 669
  • 1
  • 5
  • 3

11 Answers11

122

From http://www.codeproject.com/KB/recipes/wildcardtoregex.aspx:

public static string WildcardToRegex(string pattern)
{
    return "^" + Regex.Escape(pattern)
                      .Replace(@"\*", ".*")
                      .Replace(@"\?", ".")
               + "$";
}

So something like foo*.xls? will get transformed to ^foo.*\.xls.$.

Alex Angas
  • 59,219
  • 41
  • 137
  • 210
Gabe
  • 84,912
  • 12
  • 139
  • 238
  • 7
    I think there will be a problem here if escaping is allowed in the wildcard pattern, e.g. if you want to match the `*` character, then the wildcard pattern for that would be `\*`. This would then be transformed into the regex `\\.*` which does something different. – Wim Coenen May 21 '14 at 14:52
  • 3
    A major problem is that this does not escape other regex special characters, so something like `a+.b*` (a valid filename) would match improperly. – dbkk Apr 22 '15 at 13:17
  • 8
    @dbkk Tthe code is calling Regex.Escape first, so it will match your example fine. – dprothero Nov 12 '15 at 20:44
  • I think the translation of the wildcard "?" should be different as for example `*.txt?`would produce no matches on any file with `.txt` extensions. I suggest the following code instead that would catch those cases: `string regex = Regex.Escape(wildcard).Replace(@"\*", ".*").Replace(@"\?", ".?"); return "^" + regex + "$";` – MrZweistein Jul 18 '21 at 21:10
22

You can do a simple wildcard mach without RegEx using a Visual Basic function called LikeString.

using Microsoft.VisualBasic;
using Microsoft.VisualBasic.CompilerServices;

if (Operators.LikeString("This is just a test", "*just*", CompareMethod.Text))
{
  Console.WriteLine("This matched!");
}

If you use CompareMethod.Text it will compare case-insensitive. For case-sensitive comparison, you can use CompareMethod.Binary.

More info here: http://www.henrikbrinch.dk/Blog/2012/02/14/Wildcard-matching-in-C

MSDN: http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.compilerservices.operators.likestring%28v=vs.100%29.ASPX

Adam Szabo
  • 11,302
  • 18
  • 64
  • 100
  • 2
    You need to add a reference to "Microsoft.VisualBasic" – user425678 Jan 16 '15 at 09:11
  • I think there's a typo. It should be `LikeOperator.LikeString`, not `Operators.LikeString`. – Janez Kuhar Dec 23 '22 at 12:31
  • @JanezKuhar there is no typo. Please check out the MSDN link I provided. – Adam Szabo Dec 23 '22 at 20:22
  • @AdamSzabo That's most puzzling. Because the following .NET fiddle reports compilation errors: https://dotnetfiddle.net/bmwua7 – Janez Kuhar Dec 23 '22 at 21:13
  • 1
    @JanezKuhar Check out Microsoft's documentation regarding the namspace: https://learn.microsoft.com/en-us/dotnet/api/microsoft.visualbasic.compilerservices.operators?view=netframework-4.7.2 or just write it in Visual Studio, ensuring you use the proper .NET Framework version (again, see documentation). I wouldn't rely too much on .NET fiddle. – Adam Szabo Dec 28 '22 at 13:45
10

The correct regular expression formulation of the glob expression d* is ^d, which means match anything that starts with d.

    string input = "Message";
    string pattern = @"^d";
    Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);

(The @ quoting is not necessary in this case, but good practice since many regexes use backslash escapes that need to be left alone, and it also indicates to the reader that this string is special).

Mark Lakata
  • 19,989
  • 5
  • 106
  • 123
  • 1
    Why was this downvoted? This is the correct solution to the OP's example. `^d.*` does the same thing as `^d` but without superfluous characters. – Mark Lakata May 30 '13 at 21:39
  • I did not downvote, but I can see your mistake: ^ char in regex means match at beginning of line... which is not equivalent to wildcard pattern such as "d*". So this is misleading. – scrat.squirrel Mar 17 '14 at 20:38
  • @woohoo - The glob pattern `d*` will only match file names that begin with `d`. The regular expression `/^d/` will only match strings that begin with `d`. They are the same. The `^` pattern in regexs can mean start of string or start of line (i.e. after `\n`) depending on the regex options but that's not applicable here, since the inputs are just strings. – Mark Lakata Mar 17 '14 at 22:39
7

Windows and *nux treat wildcards differently. *, ? and . are processed in a very complex way by Windows, one's presence or position would change another's meaning. While *nux keeps it simple, all it does is just one simple pattern match. Besides that, Windows matches ? for 0 or 1 chars, Linux matches it for exactly 1 chars.

I didn't find authoritative documents on this matter, here is just my conclusion based on days of tests on Windows 8/XP (command line, dir command to be specific, and the Directory.GetFiles method uses the same rules too) and Ubuntu Server 12.04.1 (ls command). I made tens of common and uncommon cases work, although there'are many failed cases too.

The current answer by Gabe, works like *nux. If you also want a Windows style one, and are willing to accept the imperfection, then here it is:

    /// <summary>
    /// <para>Tests if a file name matches the given wildcard pattern, uses the same rule as shell commands.</para>
    /// </summary>
    /// <param name="fileName">The file name to test, without folder.</param>
    /// <param name="pattern">A wildcard pattern which can use char * to match any amount of characters; or char ? to match one character.</param>
    /// <param name="unixStyle">If true, use the *nix style wildcard rules; otherwise use windows style rules.</param>
    /// <returns>true if the file name matches the pattern, false otherwise.</returns>
    public static bool MatchesWildcard(this string fileName, string pattern, bool unixStyle)
    {
        if (fileName == null)
            throw new ArgumentNullException("fileName");

        if (pattern == null)
            throw new ArgumentNullException("pattern");

        if (unixStyle)
            return WildcardMatchesUnixStyle(pattern, fileName);

        return WildcardMatchesWindowsStyle(fileName, pattern);
    }

    private static bool WildcardMatchesWindowsStyle(string fileName, string pattern)
    {
        var dotdot = pattern.IndexOf("..", StringComparison.Ordinal);
        if (dotdot >= 0)
        {
            for (var i = dotdot; i < pattern.Length; i++)
                if (pattern[i] != '.')
                    return false;
        }

        var normalized = Regex.Replace(pattern, @"\.+$", "");
        var endsWithDot = normalized.Length != pattern.Length;

        var endWeight = 0;
        if (endsWithDot)
        {
            var lastNonWildcard = normalized.Length - 1;
            for (; lastNonWildcard >= 0; lastNonWildcard--)
            {
                var c = normalized[lastNonWildcard];
                if (c == '*')
                    endWeight += short.MaxValue;
                else if (c == '?')
                    endWeight += 1;
                else
                    break;
            }

            if (endWeight > 0)
                normalized = normalized.Substring(0, lastNonWildcard + 1);
        }

        var endsWithWildcardDot = endWeight > 0;
        var endsWithDotWildcardDot = endsWithWildcardDot && normalized.EndsWith(".");
        if (endsWithDotWildcardDot)
            normalized = normalized.Substring(0, normalized.Length - 1);

        normalized = Regex.Replace(normalized, @"(?!^)(\.\*)+$", @".*");

        var escaped = Regex.Escape(normalized);
        string head, tail;

        if (endsWithDotWildcardDot)
        {
            head = "^" + escaped;
            tail = @"(\.[^.]{0," + endWeight + "})?$";
        }
        else if (endsWithWildcardDot)
        {
            head = "^" + escaped;
            tail = "[^.]{0," + endWeight + "}$";
        }
        else
        {
            head = "^" + escaped;
            tail = "$";
        }

        if (head.EndsWith(@"\.\*") && head.Length > 5)
        {
            head = head.Substring(0, head.Length - 4);
            tail = @"(\..*)?" + tail;
        }

        var regex = head.Replace(@"\*", ".*").Replace(@"\?", "[^.]?") + tail;
        return Regex.IsMatch(fileName, regex, RegexOptions.IgnoreCase);
    }

    private static bool WildcardMatchesUnixStyle(string pattern, string text)
    {
        var regex = "^" + Regex.Escape(pattern)
                               .Replace("\\*", ".*")
                               .Replace("\\?", ".")
                    + "$";

        return Regex.IsMatch(text, regex);
    }

There's a funny thing, even the Windows API PathMatchSpec does not agree with FindFirstFile. Just try a1*., FindFirstFile says it matches a1, PathMatchSpec says not.

deerchao
  • 10,454
  • 9
  • 55
  • 60
  • For those who really want to have the exact behavior of Windows, I can only give you a hint: `FsRtlIsNameInExpression` and `FindFirstFile`. – deerchao Jun 12 '13 at 17:17
  • +1 for the research, especially the discovery that the two APIs disagree – Marcel Popescu Jun 28 '13 at 11:25
  • The question was not really explicit whether it was about just matching a string against a pattern (probably) or against current directory. And indeed the latter case is awful because Windows silently matches against the pattern not only the visible file names but also the hidden DOS 8.3 file names, which sometime makes unexpected matches! – Stéphane Gourichon Oct 28 '14 at 19:06
5

d* means that it should match zero or more "d" characters. So any string is a valid match. Try d+ instead!

In order to have support for wildcard patterns I would replace the wildcards with the RegEx equivalents. Like * becomes .* and ? becomes .?. Then your expression above becomes d.*

Anders Zommarin
  • 7,094
  • 2
  • 25
  • 24
  • 1
    +1 for being the only person to explain why `d*` gave OP unexpected results. But two concerns: (1) Isn't the regex `d.*` equivalent to simply `d`? (2) Wildcard `d*` implies the input should start with `d`, whereas regex `d.*` permits `d` anywhere in the input. – Disillusioned Dec 17 '13 at 14:55
3

You need to convert your wildcard expression to a regular expression. For example:

    private bool WildcardMatch(String s, String wildcard, bool case_sensitive)
    {
        // Replace the * with an .* and the ? with a dot. Put ^ at the
        // beginning and a $ at the end
        String pattern = "^" + Regex.Escape(wildcard).Replace(@"\*", ".*").Replace(@"\?", ".") + "$";

        // Now, run the Regex as you already know
        Regex regex;
        if(case_sensitive)
            regex = new Regex(pattern);
        else
            regex = new Regex(pattern, RegexOptions.IgnoreCase);

        return(regex.IsMatch(s));
    } 
carlos357
  • 449
  • 5
  • 9
3

You must escape special Regex symbols in input wildcard pattern (for example pattern *.txt will equivalent to ^.*\.txt$) So slashes, braces and many special symbols must be replaced with @"\" + s, where s - special Regex symbol.

Gabe
  • 84,912
  • 12
  • 139
  • 238
Camarada
  • 107
  • 1
  • 9
1

I think @Dmitri has nice solution at Matching strings with wildcard https://stackoverflow.com/a/30300521/1726296

Based on his solution, I have created two extension methods. (credit goes to him)

May be helpful.

public static String WildCardToRegular(this String value)
{
        return "^" + Regex.Escape(value).Replace("\\?", ".").Replace("\\*", ".*") + "$";
}

public static bool WildCardMatch(this String value,string pattern,bool ignoreCase = true)
{
        if (ignoreCase)
            return Regex.IsMatch(value, WildCardToRegular(pattern), RegexOptions.IgnoreCase);

        return Regex.IsMatch(value, WildCardToRegular(pattern));
}

Usage:

string pattern = "file.*";

var isMatched = "file.doc".WildCardMatch(pattern)

or

string xlsxFile = "file.xlsx"
var isMatched = xlsxFile.WildCardMatch(pattern)
Community
  • 1
  • 1
Tejasvi Hegde
  • 2,694
  • 28
  • 20
0

All upper code is not correct to the end.

This is because when searching zz*foo* or zz* you will not get correct results.

And if you search "abcd*" in "abcd" in TotalCommander will he find a abcd file so all upper code is wrong.

Here is the correct code.

public string WildcardToRegex(string pattern)
{             
    string result= Regex.Escape(pattern).
        Replace(@"\*", ".+?").
        Replace(@"\?", "."); 

    if (result.EndsWith(".+?"))
    {
        result = result.Remove(result.Length - 3, 3);
        result += ".*";
    }

    return result;
}
Tzah Mama
  • 1,547
  • 1
  • 13
  • 25
0

You may want to use WildcardPattern from System.Management.Automation assembly. See my answer here.

Community
  • 1
  • 1
VirtualVDX
  • 2,231
  • 1
  • 13
  • 14
0

The most accepted answer works fine for most cases and can be used in most scenarios:

"^" + Regex.Escape(pattern).Replace(@"\*", ".*").Replace(@"\?", ".") + "$";

However if you allow escaping in you input wildcard pattern, e.g. "find \*", meaning you want to search for a string "find *" with asterisk, it won't work. The already escaped * will be escaped to "\\\\\\*" and after replacing we have "^value\\ with\\\\.*$", which is wrong.

The following code (which for sure can be optimized and rewritten) handles that special case:

  public static string WildcardToRegex(string wildcard)
    {
        var sb = new StringBuilder();
        for (var i = 0; i < wildcard.Length; i++)
        {
            // If wildcard has an escaped \* or \?, preserve it like it is in the Regex expression
            var character = wildcard[i];
            if (character == '\\' && i < wildcard.Length - 1)
            {
                if (wildcard[i + 1] == '*')
                {
                    sb.Append("\\*");
                    i++;
                    continue;
                }

                if (wildcard[i + 1] == '?')
                {
                    sb.Append("\\?");
                    i++;
                    continue;
                }
            }

            switch (character)
            {
                // If it's unescaped * or ?, change it to Regex equivalents. Add more wildcard characters (like []) if you need to support them.
                case '*':
                    sb.Append(".*");
                    break;
                case '?':
                    sb.Append('.');
                    break;
                default:
                    //// Escape all other symbols because wildcard could contain Regex special symbols like '.'
                    sb.Append(Regex.Escape(character.ToString()));
                    break;
            }
        }

        return $"^{sb}$";
    }

Solution for the problem just with Regex substitutions is proposed here https://stackoverflow.com/a/15275806/1105564

Maxim Zabolotskikh
  • 3,091
  • 20
  • 21