6

I am very new to RegEx -- so can someone please help me figure out what exactly is going wrong here?

I have this code:

       string regPattern = "*[~#%&*{}/<>?|\"-]+*";
       string replacement = "";
       Regex regExPattern = new Regex(regPattern);

Yet, when my app hits the regExPattern line, i get an ArgumentException -- Quantifier {x,y} following nothing error.

Can someone help?

EDIT: I need to pass this pattern into a foreach loop like so:

    if (paths.Contains(regPattern))
        {
            foreach (string files2 in paths)
            {
                try
                {
                    string filenameOnly = Path.GetFileName(files2);
                    string pathOnly = Path.GetDirectoryName(files2);
                    string sanitizedFileName = regExPattern.Replace(filenameOnly, replacement);
                    string sanitized = Path.Combine(pathOnly, sanitizedFileName);
                    //write to streamwriter
                    System.IO.File.Move(files2, sanitized);

                }
                catch (Exception ex)
                {
                    //write to streamwriter

                }
            }
        } 
        else
        { 
        //write to streamwriter

        }

How do i define the pattern if it is being passed into this loop?

yeahumok
  • 2,940
  • 19
  • 52
  • 63
  • To be specific -- the pattern i have in the code is meant to get rid of those invalid characters in file names. therefore i need to get rid of an asterisk, tilde, pound sign, brackets, angle brackets, etc. is this the correct pattern for that? – yeahumok Jun 28 '10 at 19:45

5 Answers5

7

Update: after reading the comment to the question I think you want simply this:

s = Regex.Replace(s, "[~#%&*{}/<>?|\"-]+", "");

Old answer: I guess when you write * you are thinking of wildcards such as those you would enter at a shell:

*.txt

This is not how the * works in regular expression syntax. What you probably want instead is .*:

".*[~#%&*{}/<>?|\"-]+.*"

The . means "any character" and the * means "zero or more of the previous".

Inside the character class [...] the * loses its special meaning and becomes a literal character so it does not need to be escaped. Escaping it unnecessarily inside the character class will not cause any harm and some people find it easier to read.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • * means asterisk -- not any character. Do i still need to backslash it? – yeahumok Jun 28 '10 at 19:44
  • The meaning of `*` changes depending on whether it is inside a character class or not. Inside a character class it means a literal `*` whether or not it is escaped. Outside of a character class it means "zero or more" if unescaped and a literal `*` if escaped. – Mark Byers Jun 28 '10 at 20:06
  • +1 (for the revision), and in C# you should use verbatim strings for regexes. They don't use the backslash as an escape character; you only have to escape the quotation mark with another quotation mark: `@"[~#%&*{}/<>?|""-]+"` – Alan Moore Jun 29 '10 at 07:02
2

The * is a quantifier meaning "zero or more times" (same as {0,}). You'll have to escape it using a backslash like this: \*

Daniel Egeberg
  • 8,359
  • 31
  • 44
0

Add . before the *

e.g. string regPattern = ".*[~#%&*{}/<>?|\"-]+.*";

Zong
  • 6,160
  • 5
  • 32
  • 46
Alpesh
  • 1
0

Since you're doing a Regex.Replace to replace any of these one-character matches with an empty string:

        string pattern = "[~#%&*{}/()<>?|\"\\\\-^[\\]]";

        string input = @"(*&af%\#$}afd]a#f%hjg{d(^(^[RF*()^FR(7r5";

        string output = Regex.Replace(input, pattern, String.Empty);
Toby
  • 7,354
  • 3
  • 25
  • 26
0

The wild card character * does not work fine alone here I tried it and to make it work fine, we need to add a . before * and wholly as .* this should work.

rish90
  • 11
  • 3