1

I have not determined why trying to use a negated character class with Regex.Replace is not replacing newlines with a space.

Here's some sample code:

namespace ConsoleApplication1
{
    class Program
       {
        static void Main(string[] args)
          {

            string testInput = "This is a test. \n This is a newline. \n this is another newline. This is a, comma";



            Console.WriteLine(testInput);


            //get rid of line breaks and other letters not allowed
            string commentFix = Regex.Replace(testInput, @"[^A-Z\sa-z\.0-9\-\:\;\$]", " ");
            commentFix = "\"" + commentFix + "\"";


            Console.WriteLine("\n");

            Console.WriteLine(commentFix);
            Console.ReadLine();


          }
      }
}

The output of this is:

This is a test.
 This is a newline.
 this is another newline. This is a, comma

"This is a test.
 This is a newline.
 this is another newline. This is a  comma"

Any ideas? (thanks, this is my first question!)

  • You don't have anything in your pattern that matches newline. See my post here: http://stackoverflow.com/questions/28743851/regular-expression-to-match-any-vertical-whitespace – rory.ap Apr 04 '16 at 19:01
  • Simply do: Regex.Replace(testInput, @"\n+", " "); – Quinn Apr 04 '16 at 19:06

1 Answers1

5

The \s matches a newline, and since it is inside a negated character class, line breaks are not removed.

See more details on what \s matched at MSDN:

\f - The form feed character, \u000C.
\n - The newline character, \u000A.
\r - The carriage return character, \u000D.
\t - The tab character, \u0009.
\v - The vertical tab character, \u000B.
\x85 - The ellipsis or NEXT LINE (NEL) character (…), \u0085.
\p{Z} - Matches any separator character.

So, if you want to remove whitespace, just take out \s (and I guess you need to replace multiple characters matched with one space, add + that will match one or more occurrences of the pattern it quantifies):

[^A-Za-z.0-9:;$-]+

See the regex demo

Also note that you do not have to escape ., :, ; and $ inside a character class, and you do not have to escape - if it is at the beginning/end of the character class.

If you plan to match whitespace with the exception of CR and LF, use [^\S\r\n]: [^A-Z\S\r\na-z.0-9:;$-]+. Here, [^\S] matches a whitespace, but \r\n are inside the negated character class, so they are not matched.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Thank you. I did forget to add the + to the end of my example. I think I mostly didn't understand that \s included \r\n. I use regular expressions constantly inside a program suite called "Laserfiche" but apparently they intentionally change their pattern matches to not go across line breaks unless you explicitly tell them to. So I'll have to break myself of the habit of using \s anytime I simply want a space and\or tab and use [ \t] instead. – Chris Hagen Apr 04 '16 at 19:51