17

I've just started using Regular Expressions and this is so overwhelming that even after reading documentation I can't seem to find where to start to help with my problem.

I have to a bunch of strings.

 "Project1 - Notepad"
 "Project2 - Notepad"
 "Project3 - Notepad"
 "Untitled - Notepad"
 "HeyHo - Notepad"

And I have a string containing a wild card.

"* - Notepad"

I would need that if I compare any of these strings with the one containing the wildcard it returns true. (With Regex.IsMatch() or something like that..)

I don't usually asks for answers like that but I just can't find what I need. Could someone just point me out in the right direction ?

phadaphunk
  • 12,785
  • 15
  • 73
  • 107
  • Is the string you want to "compare with" going to be a static string, user-input, etc? In other words, will it always be `* - Notepad` (or similar), or can it be modified during runtime? – newfurniturey Mar 07 '13 at 15:57
  • wildcard would be `.*` (`.` being anything and `*` being repeat zero or more times) – default Mar 07 '13 at 15:57
  • It's going to be user-inputed string. It could compare Project1 - Notepad with Project1 - Notepad. But the user would have the choice to include all the Notepad projects with a wildcard – phadaphunk Mar 07 '13 at 15:59
  • 2
    Similar question is posted here. I hope it helps: http://stackoverflow.com/questions/10400844/validate-that-a-string-contain-some-exact-words-with-regex-c-sharp – Alexander Forbes-Reed Mar 07 '13 at 15:59
  • Are you trying to match anything that ends in "- Notepad" ? – Babblo Mar 07 '13 at 16:00
  • take the string containing `*`, and replace `*` with `.*`, then check matches. – Kent Mar 07 '13 at 16:01
  • @Babblo yes but it is variable. It could be anything that Ends with - Notepad or Untitled {*} - Notepad... the user inputs the wildcard – phadaphunk Mar 07 '13 at 16:01

4 Answers4

30

The wildcard * is equivalent to the Regex pattern ".*" (greedy) or ".*?" (not-greedy), so you'll want to perform a string.Replace():

string pattern = Regex.Escape(inputPattern).Replace("\\*", ".*?");

Note the Regex.Escape(inputPattern) at the beginning. Since inputPattern may contain special characters used by Regex, you need to properly escape those characters. If you don't, your pattern would explode.

Regex.IsMatch(input, ".NET"); // may match ".NET", "aNET", "FNET", "7NET" and many more

As a result, the wildcard * is escaped to \\*, which is why we replace the escaped wildcard rather than just the wildcard itself.


To use the pattern

you can do either:

Regex.IsMatch(input, pattern);

or

var regex = new Regex(pattern);
regex.IsMatch(input);

Difference between greedy and not-greedy

The difference is in how much the pattern will try to match.

Consider the following string: "hello (x+1)(x-1) world". You want to match the opening bracket ( and the closing bracket ) as well as anything in-between.

Greedy would match only "(x+1)(x-1)" and nothing else. It basically matches the longest substring it can find.

Not-greedy would match "(x+1)" and "(x-1)" and nothing else. In other words: the shortest substrings possible.


How to escape the wildcard character?

@MaximZabolotskikh asked about the possibility of escaping the wildcard character, so that "Hello \* World" would literally match "Hello * World".

To do this would require multiple substitutions.

  1. Escape the regex.

  2. Substitute any occurrence of \\\\ (double backslash) with the escape character \x1b. This allows us to identify backslashes that were in the original input.

  3. Substitute any occurrence of \x1b\x1b with \\\\. This allows matching a literal \ by using \\.

  4. Use the negative lookbehind pattern (?<!\x1b)\\\* to substitute * with the wildcard pattern (either .* or .*?) but only if it isn't preceded by a backslash. This will insert the wildcard pattern in "Hello * World" and "Hello \\* World", but not in Hello \* World. We need to match \\\* because * is changed to \* after escaping, so we're actually matching a literal \ (using \\) and a literal * (using \*).

  5. Any escaped * will now be \x1b\\*, which will eventually be substituted to \\\\*, but we actually want it to be \\* instead so we can match a literal * later on. Therefore, substitute \x1b\\* with \\*.

  6. Finally, substitute all \x1b back to \\.

Here's an example (I am using @"" here to avoid typing double backslashes):

string pattern = Regex.Escape(inputPattern);
pattern = pattern.Replace(@"\\", "\x1b");
pattern = pattern.Replace("\x1b\x1b", @"\\");
pattern = Regex.Replace(pattern, @"(?<!\x1b)\\\*", ".*?");
pattern = pattern.Replace("\x1b\\*", @"\*");
pattern = pattern.Replace('\x1b', '\\');
Nolonar
  • 5,962
  • 3
  • 36
  • 55
  • So in this case string pattern would be * - Notepad ? – phadaphunk Mar 07 '13 at 16:02
  • @PhaDaPhunk Assuming `inputPattern` is "* - Notepad", `string pattern` will be ".*? - Notepad". Keep in mind though, that if `myString` contains other special characters recognized by Regex, such as `. [ ( ) ] \ * + ?` your pattern will be quite messed up. – Nolonar Mar 07 '13 at 16:05
  • What if I **had** to use these characters like "?" or "." is there a way around ? – phadaphunk Mar 07 '13 at 16:43
  • @PhaDaPhunk Yes, there is. Just read the part below "**Edit:**". Basically, all you'll have to do is find all these characters and place a single `\ ` in front of those characters, so `\ ` becomes `\\ ` or `.` becomes `\.` or `?` becomes `\?` etc. – Nolonar Mar 07 '13 at 16:54
  • didn't see that part. Thanks a lot !! – phadaphunk Mar 07 '13 at 18:04
  • in this case, what is: "inputPattern"? – BenKoshy Mar 03 '16 at 06:24
  • 1
    @BKSpurgeon. It's the original pattern: `"* - Notepad"`. In regex, the asterisk `*` matches the previous character 0 or more times, so `"hell*world"` could match `"helworld"`, `"hellworld"`, `"helllworld"`, `"hellllworld"`, etc. If you want to match `"helloworld"` or `"hello world"` or similar instead, you need to replace the `*` in the original pattern (inputPattern) with `.*` or `.*?` – Nolonar Mar 03 '16 at 13:21
  • 1
    Use `string pattern = Regex.Escape(inputPattern).Replace("*", ".*?");`. Otherwise if the pattern allready contains some regex special characters (dot for example), then the result will not be valid. – Mahmood Dehghan Jan 07 '19 at 12:33
  • @Mahmoodvcs. Thank you very much. – Nolonar Jan 07 '19 at 14:08
  • @Nolonar Thank you too. You edit made me aware of a mistake that I was making. – Mahmood Dehghan Jan 07 '19 at 14:49
  • What if my initial pattern is "value with \\*"? I.e. I want * to be a literal character? Than after escaping I have "value\\ with\\\\\\\*" and after replacing "^value\\ with\\\\.*$", which is wrong. – Maxim Zabolotskikh Jan 24 '23 at 08:58
  • @MaximZabolotskikh, thank you for pointing this out. I've edited my answer to address this. – Nolonar Jan 24 '23 at 17:44
  • @Nolonar thank you for the wildcard solution, I will try this out. Meanwhile, I solved this with a direct string inspection https://stackoverflow.com/a/75220383/1105564 – Maxim Zabolotskikh Jan 25 '23 at 13:00
5

I just wrote this quickly (based off of Validate that a string contain some exact words)

    static void Main()
    {
        string[] inputs = 
        {
            "Project1 - Notepad", // True
            "Project2 - Notepad", // True
            "HeyHo - Notepad", // True
            "Nope - Won't work" // False
        };

        const string filterParam = "Notepad";
        var pattern = string.Format(@"^(?=.*\b - {0}\b).+$", filterParam);

        foreach (var input in inputs)
        {
            Console.WriteLine(Regex.IsMatch(input, pattern));
        }
        Console.ReadLine();
    }
Community
  • 1
  • 1
Alexander Forbes-Reed
  • 2,823
  • 4
  • 27
  • 44
  • Works great ! Only thing is since Notepad will be variable, using something like this : @"^(?=.*\b - " + variableContainingNotepad + "\b).+$" to change the word Notepad doesn't seem to work because of the "+" characters – phadaphunk Mar 07 '13 at 16:10
  • use string.Format, I'll update my first post. – Alexander Forbes-Reed Mar 07 '13 at 16:14
3

You should do like this:

string myPattern = "* - Notepad";
foreach(string currentString in myListOfString)
    if(Regex.IsMatch(currentString, myPattern, RegexOptions.Singleline){
        Console.WriteLine("Found : "+currentString);
    }
}

By the way I saw you came from Montreal, additional french documentation + usefull tool: http://www.olivettom.com/?p=84

Good luck!

Thomas
  • 5,603
  • 5
  • 32
  • 48
1

Seems like the pattern you want is the following:

/^.+-\s*Notepad$/

This pattern will match an entire string if it ends with "- Notepad".

Daedalus
  • 1,667
  • 10
  • 12