-2

I'm pretty bad at Regex (C#), so I break things down into parts. The goal of the following series of Regex statements is to take an arbitrary string and reduce it to lower case of the format "this is a test of 4mg/cc susp".

This is what I've been doing:

// Test string
string str1 = @"     This is\ 'a'   test   of 4mg/cc susp  ";

// Remove special characters except for space and /
str1 = Regex.Replace(str1, @"[^0-9a-zA-Z /]+", "");

// Remove all but one space from within the string. Trim the ends.
str1 = Regex.Replace(str1.Trim(), @"\s+", " ");

// Convert all to lower case
str1 = str1.ToLower();

Is there a single Regex (C#) statement that can accomplish all the above?

Alan Wayne
  • 5,122
  • 10
  • 52
  • 95
  • Why do you want to combine them? I think that would make it less readable. I would just append the `ToLower()` call to the second `Regex.Replace()` and leave the first one as is. – 41686d6564 stands w. Palestine Apr 11 '22 at 03:07
  • 2
    @Wiktor Seriously? This question has absolutely nothing to do with the dup-target. a) The main problem is about combining two `Regex.Replace` calls into one, not about case conversion. b) Even the part about case conversion is for the entire string, not a "part of string". – 41686d6564 stands w. Palestine Apr 11 '22 at 18:16

1 Answers1

2

I would argue that trying to combine both patterns into one would make it less readable. You could keep using two calls to Regex.Replace() and just append .ToLower() to the second one:

// Remove special characters except for space and /
str1 = Regex.Replace(str1, @"[^0-9a-zA-Z /]+", "");

// Remove all but one space, trim the ends, and convert to lower case.
str1 = Regex.Replace(str1.Trim(), @"\s+", " ").ToLower();
//                                             ^^^^^^^^^

That said, if you really have to use a one-liner, you could write something like this:

str1 = Regex.Replace(str1, @"[^A-Za-z0-9 /]+|( )+", "$1").Trim().ToLower();

This matches any character not present in the negated character class or one or more space characters, placing the space character in a capturing group, and replaces each match with what was captured in group 1 (i.e., nothing or a single space character).

For the sake of completeness, if you want to also handle the trimming with regex (and make the pattern even less readable), you could:

str1 = Regex.Replace(str1, @"[^A-Za-z0-9 /]+|^ +| +$|( )+", "$1").ToLower();
  • A side note: Regex also supports case changing `\L` `\U`. For this question, one can use `\L$1`. – Xiang Wei Huang Apr 11 '22 at 05:35
  • 1
    @Xiang I've thought of that but unfortunately, case conversion in replacement text is not supported by .NET's regex engine (at least not pure regex). Your only option would be using a `MatchEvaluator` and calling `.ToLower()` on _individual_ matches, which is overly complicated and just redundant because you can just call `.ToLower()` on the entire string and be done with it. – 41686d6564 stands w. Palestine Apr 11 '22 at 05:44
  • :O Didn't know that ain't working in .NET! Thank you for correcting me. I'd also not use that in C# too anyway. More fit for quick-and-dirty string dealing in text editors like VSCode. – Xiang Wei Huang Apr 11 '22 at 05:50