0
var temp = "08x09cm";
temp = Regex.Replace(temp, "[x,c]", " $0");
temp = Regex.Replace(temp, "[x]", "$0 ");
temp = Regex.Replace(temp, "(^[0])|([\\s][0])", " ");
temp = Regex.Replace(temp, "(^[0])", "");

The final output should be: 8 x 9 cm

After following jdweng anwser:

Regex.Replace(temp,
@"(?'w'\d+\.?\d+)x(?'h'\d+\.?\d+)(?'ext'\w*)",
delegate (Match m) {
    return $"{decimal.Parse(m.Groups["w"].Value)} x {decimal.Parse(m.Groups["h"].Value)} {m.Groups["ext"].Value}";
});

I added the ext as a group for keeping the unit and made it able to use decimal other than int.

Sample Results:
Input: "05.55x55Meter" Output: "5.55 x 55 Meter"
Input: "05x05cm" Output: "5 x 5 cm"
Input: "11x11m" Output: "11 x 11 m"

Edited:

Sunside has the most robust solution covering every scenario i can throw at it.

Here's my implementation of this:

The (? /w) allows any length a-z 0-9 A-Z eg: Meter. this suits my use case as the user may enter cm or Centimetre.

string Pattern = @" (?<width> [\d*\.'\s]* ) \s* x \s* (?<height> [\d*\.'\s]* ) \s* (?<unit> \w)";
RegexOptions Options = RegexOptions.Compiled | RegexOptions.ExplicitCapture | 
RegexOptions.IgnorePatternWhitespace;
Regex RegularExpression = new Regex(Pattern, Options);
string result = RegularExpression.Replace(input, Format);

Then using the Format(Match m) function on Sunsides Post.

string Format(Match m) {

    if (!decimal.TryParse(m.Groups["width"].Value, NumberStyles.Any, CultureInfo.CurrentCulture, out var w)){
        return m.ToString();
    }

    if (!decimal.TryParse(m.Groups["height"].Value, NumberStyles.Any, CultureInfo.CurrentCulture, out var h)){
        return m.ToString();
    }

    return $"{w} x {h} {m.Groups["unit"].Value}";
}
Tim
  • 89
  • 1
  • 8

2 Answers2

3

How about following :

const string pattern = @"(?'length'\d+)x(?'width'\d+).*";

static void Main(string[] args)
{
    string input = "08x09cm";
    string output = Regex.Replace(input, pattern, ReplaceCC);
}

static string ReplaceCC(Match m)
{
    int length = int.Parse(m.Groups["length"].Value);
    int width = int.Parse(m.Groups["width"].Value);
    return string.Format("{0} x {1} cm", length, width);
}
jdweng
  • 33,250
  • 2
  • 15
  • 20
1

While the answer related to using a match evaluator delegate / function in Regex.Replace() to resolve the substitution is pretty smooth, I'd like to add another take that uses regex only.

There are a couple of assumptions here:

  • A value can be prefixed by any sequence of zeros, which is to be ignored; 0+ is to be treated as 0.
  • A value suffixed by zeroes keeps those zeroes, such that e.g. 1 and 10000 are different values (that one's a bit of a no-brainer).

With that, we have two paths:

  • Any zero-prefix followed by a single zero captures only the last occurrence of 0. This could be written as 0*? (?<value> 0).
  • Any zero-prefix followed by a non-zero digit, followed by any sequence of digits ignores the leading zeroes. This could be written as 0*? (?<value> [1-9]\d*).

By using the fact that only one of the two conditions will ever be met, we can combine both patterns e.g. using a pattern like this:

0*? ( (?<value> [1-9]\d* ) | (?<value> 0 ) )

This would then already be the key to the solution; the following code disables interpretation of whitespace (i.e. uses \s to capture it explictly) for readability and suppresses all unnamed groups.

private const string Pattern1 = @"
            0*? ( (?<length> [1-9]\d* ) | (?<length> 0 ) )
        \s* x
        \s* 0*? ( (?<width>  [1-9]\d* ) | (?<width> 0 ) )
        \s* (?<unit> cm )";

private const RegexOptions Options = RegexOptions.Compiled | 
                                     RegexOptions.ExplicitCapture | 
                                     RegexOptions.IgnorePatternWhitespace;

private static readonly Regex Regex = new Regex(Pattern, Options);

With this, you can then run the replacements using

var result = Regex.Replace(input, @"${length} x ${width} ${unit}");

A side note: The \d+-like patterns obviously do not capture any culture specific formatting of numbers at all. This definitely starts to get ugly when the requirement shifts from integral numbers to fractional ones. For example, the number 202667.4 could be represented as 202.667,40 in a German culture setting (ignoring the thousands separator, which may or may not already slap one in the face). Nobody should write numbers like this, but here we are.

Edit

Since the question was changed to support fractional numbers as well, the naive change would be to add (\. \d+)? to the regexes. However, as mentioned above, cultural differences are weird - this is where the already suggested match evaluator comes in.

In this case, you could try using a much more relaxing pattern such as

private const string Pattern2 = @"
        \s* (?<length> [\d\.,]+ )
        \s* x
        \s* (?<width> [\d\.,]+ )
        \s* (?<unit> cm )";

private static readonly Regex Regex2 = new Regex(Pattern2, Options);

and then pass the formatting to the delegate:

var replaced = Regex2.Replace(input, Format);

string Format(Match m)
{
    if (!decimal.TryParse(m.Groups["length"].Value, NumberStyles.Any, CultureInfo.CurrentCulture, out var length))
    {
        return m.ToString();
    }

    if (!decimal.TryParse(m.Groups["width"].Value, NumberStyles.Any, CultureInfo.CurrentCulture, out var width))
    {
        return m.ToString();
    }

    var unit = m.Groups["unit"].Value;
    return $"{length} x {width} {unit}";
}

Note the use of decimal to avoid rounding issues, as well as NumberStyles.Any and CultureInfo.CurrentCulture to capture the correct format. The Format() method also gracefully returns the captured group if the number was not actually a correct format; if this method throws an exception instead, the whole regex.Replace() call will terminate.

Also note that the above code does not try to deal with whitespace within numbers, such as 10 000. If you want to support formats like these, you'd have to deal with it in the match evaluator method; specifically, decimal.TryParse() may not interpret 10 000 as a valid number, depending on the selected CultureInfo.


For fun and profit, here's some test inputs:

var inputs = new[]
{
    "08x09cm",
    "0x10cm",
    "0 x10cm",
    "0x 0cm",
    "000.00 x 00000cm",
    "let's look at 10000x00100cm as well",
    "no x way cm",
    ".0 x 1. cm",
};

foreach (var input in inputs)
{
    Console.WriteLine("- input:  {0}\n  result: {1}\n  result: {2}", input, Reformat1(input), Reformat3(input));
}

This would print

- input:  08x09cm
  result: 8 x 9 cm
  result: 8 x 9 cm
- input:  0x10cm
  result: 0 x 10 cm
  result: 0 x 10 cm
- input:  0 x10cm
  result: 0 x 10 cm
  result: 0 x 10 cm
- input:  0x 0cm
  result: 0 x 0 cm
  result: 0 x 0 cm
- input:  000.00 x 00000cm
  result: 000.0 x 0 cm
  result: 0.00 x 0 cm
- input:  let's look at 10000x00100cm as well
  result: let's look at 10000 x 100 cm as well
  result: let's look at 10000 x 100 cm as well
- input:  no x way cm
  result: no x way cm
  result: no x way cm
- input:  .0 x 1. cm
  result: .0 x 1. cm
  result: 0.0 x 1 cm

(This is mostly testing sunny cases, but it's a start.)

sunside
  • 8,069
  • 9
  • 51
  • 74