While the answer related to using a match evaluator delegate / function in Regex.Replace()
to resolve the substitution is pretty smooth, I'd like to add another take that uses regex only.
There are a couple of assumptions here:
- A value can be prefixed by any sequence of zeros, which is to be ignored;
0+
is to be treated as 0
.
- A value suffixed by zeroes keeps those zeroes, such that e.g.
1
and 10000
are different values (that one's a bit of a no-brainer).
With that, we have two paths:
- Any zero-prefix followed by a single zero captures only the last occurrence of
0
. This could be written as 0*? (?<value> 0)
.
- Any zero-prefix followed by a non-zero digit, followed by any sequence of digits ignores the leading zeroes. This could be written as
0*? (?<value> [1-9]\d*)
.
By using the fact that only one of the two conditions will ever be met, we can combine both patterns e.g. using a pattern like this:
0*? ( (?<value> [1-9]\d* ) | (?<value> 0 ) )
This would then already be the key to the solution; the following code disables interpretation of whitespace (i.e. uses \s
to capture it explictly) for readability and suppresses all unnamed groups.
private const string Pattern1 = @"
0*? ( (?<length> [1-9]\d* ) | (?<length> 0 ) )
\s* x
\s* 0*? ( (?<width> [1-9]\d* ) | (?<width> 0 ) )
\s* (?<unit> cm )";
private const RegexOptions Options = RegexOptions.Compiled |
RegexOptions.ExplicitCapture |
RegexOptions.IgnorePatternWhitespace;
private static readonly Regex Regex = new Regex(Pattern, Options);
With this, you can then run the replacements using
var result = Regex.Replace(input, @"${length} x ${width} ${unit}");
A side note: The \d+
-like patterns obviously do not capture any culture specific formatting of numbers at all. This definitely starts to get ugly when the requirement shifts from integral numbers to fractional ones. For example, the number 202667.4
could be represented as 202.667,40
in a German culture setting (ignoring the thousands separator, which may or may not already slap one in the face). Nobody should write numbers like this, but here we are.
Edit
Since the question was changed to support fractional numbers as well, the naive change would be to add (\. \d+)?
to the regexes. However, as mentioned above, cultural differences are weird - this is where the already suggested match evaluator comes in.
In this case, you could try using a much more relaxing pattern such as
private const string Pattern2 = @"
\s* (?<length> [\d\.,]+ )
\s* x
\s* (?<width> [\d\.,]+ )
\s* (?<unit> cm )";
private static readonly Regex Regex2 = new Regex(Pattern2, Options);
and then pass the formatting to the delegate:
var replaced = Regex2.Replace(input, Format);
string Format(Match m)
{
if (!decimal.TryParse(m.Groups["length"].Value, NumberStyles.Any, CultureInfo.CurrentCulture, out var length))
{
return m.ToString();
}
if (!decimal.TryParse(m.Groups["width"].Value, NumberStyles.Any, CultureInfo.CurrentCulture, out var width))
{
return m.ToString();
}
var unit = m.Groups["unit"].Value;
return $"{length} x {width} {unit}";
}
Note the use of decimal
to avoid rounding issues, as well as NumberStyles.Any
and CultureInfo.CurrentCulture
to capture the correct format. The Format()
method also gracefully returns the captured group if the number was not actually a correct format; if this method throws an exception instead, the whole regex.Replace()
call will terminate.
Also note that the above code does not try to deal with whitespace within numbers, such as 10 000
. If you want to support formats like these, you'd have to deal with it in the match evaluator method; specifically, decimal.TryParse()
may not interpret 10 000
as a valid number, depending on the selected CultureInfo
.
For fun and profit, here's some test inputs:
var inputs = new[]
{
"08x09cm",
"0x10cm",
"0 x10cm",
"0x 0cm",
"000.00 x 00000cm",
"let's look at 10000x00100cm as well",
"no x way cm",
".0 x 1. cm",
};
foreach (var input in inputs)
{
Console.WriteLine("- input: {0}\n result: {1}\n result: {2}", input, Reformat1(input), Reformat3(input));
}
This would print
- input: 08x09cm
result: 8 x 9 cm
result: 8 x 9 cm
- input: 0x10cm
result: 0 x 10 cm
result: 0 x 10 cm
- input: 0 x10cm
result: 0 x 10 cm
result: 0 x 10 cm
- input: 0x 0cm
result: 0 x 0 cm
result: 0 x 0 cm
- input: 000.00 x 00000cm
result: 000.0 x 0 cm
result: 0.00 x 0 cm
- input: let's look at 10000x00100cm as well
result: let's look at 10000 x 100 cm as well
result: let's look at 10000 x 100 cm as well
- input: no x way cm
result: no x way cm
result: no x way cm
- input: .0 x 1. cm
result: .0 x 1. cm
result: 0.0 x 1 cm
(This is mostly testing sunny cases, but it's a start.)